Linking and loading

getting to program execution

To eventually run (execute) a program, the following things are done:

  1. Write source code
  2. Assemble source code, producing machine code
  3. Link and load machine code
  4. Set PC to point to address of first instruction within code. (This is a jump to the first instruction in the program.)

We have talked about steps 1 and 2.

Linking and Loading

Assembly just produces enough information about what goes where in memory to make the code run. It does not actually put the stuff in memory. Linking and loading puts all the stuff into memory at the right places.

What goes into memory?

Where are the correct locations?
Answer: Exactly where the assembler assigns them.

For example,
The data section starts at 0x00400000 in our simulator of the MIPS RISC processor.

So, if we had source code with,

    a1:  .word 15
    a2.  .word -2

then the assembler needs to specify that memory will need to be initially set up with

    address        contents
    0x00400000     0000 0000 0000 0000 0000 0000 0000 1111
    0x00400004     1111 1111 1111 1111 1111 1111 1111 1110

Like the data, the code needs to be placed starting at a specific location to make it work.

Problems with these simple assumptions

Consider the case where the assembly language code is split across 2 files. Each is assembled separately.

in file 1:

         a1:  .word 15
         a2:  .word -2

         __start: la  $t0, a1
                  add  $t1, $t0, $s3
		  jal  proc5

in file 2:

         a3:  .word 0

         proc5:   lw   $t6, a1
                  sub  $t2, $t0, $s4
                  jr   $ra

Two problems with this:

  1. Each file is assembled to start its data section and also its code section at the same location as the other file.
    a1 (in file1) is supposed to be placed at 0x00400000
    a3 (in file2) is supposed to be placed at 0x00400000

    __start (in file1) is placed at location 0x00800000
    proc5 (in file2) is placed at location 0x00800000
  2. When assembling file 1, symbol proc5 is never defined (given an address). That is because the label (symbol) is defined in file 2. The address assigned to proc5 is needed to produce the machine code for the jal instruction in file 1.

    This same problem presents itself in the lw instruction in file 2. The address assigned to a1 is unknown when assembling file 2. This is because the symbol a1 is defined (and given an address) in file 1.

The real problem here is that absolute addresses are needed to produce the machine code.

Solutions to the problems

  1. A really bad solution that no one would ever implement: define the problem away, by not allowing separate files to contain assembly language source code.

    A single program (all code and data) MUST be all in one file.

    Why is this bad?

  2. Allow the step of linking and loading to

    To accomodate linking and loading, the information produced by the assembler must include:

    This last one is something new.

How we really do Linking and Loading

Have the assembler

Linking and loading will:
Copyright © Karen Miller, 2006