Data Structures

A common theme in programming:
SPACE vs. TIME tradeoff
space is memory space
time is time to execute program

It is often possible to write a program such that it
1. executes very fast, but wastes/utilizes more memory
or
2. utilizes little memory, but executes slower (as compared to option 1).

Data structures can make memory usage efficient or inefficient. and they can cause the algorithms required for operation to be more or less efficient.

The data structures we will discuss: arrays, stacks, and queues.

Arrays

Array implementation is important because

  1. most assembly languages have no concept of arrays
  2. from an array, any other data structure we might want can be built

Properties of arrays:

  1. each element is the same size (char = 1 byte, integer = 1 word)
  2. elements are stored contiguously, with the first element stored at the smallest memory address (called the base address)

So, the whole trick in assembly language is

  1. to allocate the correct amount of space for an array
  2. an address tells the location of an element

memory can be thought of as an array

MAL declarations of arrays within memory

To allocate a portion of memory (more than a single variable's worth) (allocating an array within memory):

     variablename:  type     initvalue:numelements

new directive:

  name:  .space    numberofbytes

.space is a way of allocating space (bytes) within memory, but not give them an initial value. Note: the type of the data within this space cannot be inferred.

Examples:

      arrayname: .byte  0:8

This gives 8 character-sized elements, numbered 0 - 7, initialized to 0, which is the null character.

      name: .space  18

This gives 18 bytes of memory (with no implied initial contents).

An example of how to calculate the address of an element:

  byte (character) elements --
       array1:  array[6..12] of char;   /* PASCAL */


        6   7   8   9  10  11  12    <---- element index
      -----------------------------
      |   |   |   |   |XXX|   |   |
      |   |   |   |   |XXX|   |   |
      -----------------------------
       25  26  27  28  29  30  31    <---- address

       want the 5th element,

       byte address of array1[10] =   25 + (10 - 6)
                       	           =   29

This same example (or close) only in MAL:

       array1:  .byte  0:7

        0   1   2   3   4   5   6    <---- element index
      -----------------------------
      |   |   |   |   |XXX|   |   |
      |   |   |   |   |XXX|   |   |
      -----------------------------
       25  26  27  28  29  30  31    <---- address

       want the 5th element,

	    array1[4] is at address     array1 + 4
	    If element 0 is at address 25,
	byte address of array1[4] =   25 + 4

How do you get the address array1?
Answer: the MAL la (load address) instruction.

This is the equivalent to the C code:

    px = &x;

In MAL:

     la   $8, x   # $8 holds an address

Or, for this particular array,

     la   $14, array1   # $14 has ADDRESS of first element of array1

Note that which register is used to hold the address is implementation dependent, and is not relevant for this example.

This is where it is extremely important to understand and keep clear the difference between an address and the contents at an address.

To reference array1[4] (the 5th element) in MAL, write the code,

     la   $14, array1   # $14 has ADDRESS of first element of array1

     # then, if  we wanted to place the character 'Q' there,
     li   $15, 'Q'
     sb   $15, 4($14) 

On to word (integer-sized) elements.

      array2:  array[0..5] of integer;   /* PASCAL declaration */

      int  array2[6];   /* C declaration */

      array2:  .word 0:6       #  MAL

      array2:  .space 24       # alternative declaration in MAL


        0   1   2   3   4   5      <-- implied index
      -------------------------
      | 0 | 0 | 0 | 0 | 0 | 0 |
      -------------------------
       80  84  88  92  96  100      <-- memory address

      byte address of array2[3] =  80 + 4(3 - 0)
				=  92

To reference array2[3] (the 4th element) in MAL, write the code,

    la  $8, array2
    add $9, $8, 12     # 3*4=12, $9 has address of desired array element
    # then, if  we wanted to subtract 1 from the value there
    lw   $10, ($9)
    sub  $10, $10, 1
    sw   $10, ($9)

After this code fragment executes, the array contents are

        0   1   2   3   4   5      <-- implied index
      -------------------------
      | 0 | 0 | 0 |-1 | 0 | 0 |
      -------------------------
       80  84  88  92  96  100      <-- memory address

In general, we need to know

  1. where the array starts (the base address)
  2. size of an element in bytes (to get a byte address)
  3. what the first element is numbered (most students only have experience with high level languages that start numbering elements at 0)
     byte address of element[x] = base + size(x - first index)

If indices are always numbered starting from 0, then there is one fewer arithmetic operation needed in computing the address of an element.

An example with a code fragment that deals with an array.

Suppose we had a 50 element array of integers. We want to initialize the array elements such that each element is the additive inverse of its index.

A diagram of what we want the code to do:

    0    1    2    3             48    49    <---- index
  --------------------         -------------
  | 0 | -1 | -2 | -3 |  . . . .| -48 | -49 |
  --------------------         -------------
   # MAL code fragment to initialize elements of the array
   .data
   array:  .word  0:50   # an array of 50 integers
   #  could have declared this as    
   #    array:  .space 200 

   # register usage:
   # $13 -- array index and loop induction variable
   # $14 -- additive inverse of array index
   # $12 -- address of element i; initialized to base address of array
   # $11 -- the constant 50

   .text

           li   $11, 50
           la   $12, array
           li   $13, 0
   for:    beq  $13, $11, end_forloop   # iterate 50 times
	   sub  $14, $0, $13
           sw   $14, ($12)              # place value into array
	   add  $12, $12, 4             # address changed by 4: 4 bytes per word
	   add  $13, $13, 1             # increment loop induction variable
	   b    for
   end_forloop:

Note: This code fragment only shows the relevant part of the program of the example. There is no __start label, because we do not care about the beginning of the program. There is no done instruction, because the program is not supposed to exit (complete) due to the operation of this code fragment.

2 Dimensional Arrays

There are more issues for 2 dimensions than for 1-dimensional arrays.

First, how to map a 2-dimensional array onto a 1-dimensional memory?

Terminology:

	  r x c array -- r rows
			 c columns
	  
	  element[y, x] -- y is row number
			   x is column number


  example:     4 x 2 array

             0     1   <(column index)
          -------------
        0 |     |     |
          -------------
        1 |     |     |
          -------------
        2 |     |  X  |               X is element [2,1]
          -------------
        3 |     |     |
        ^ -------------
      (row index)

Mapping this 4 x 2 array into memory. There are 2 possiblilities.

row major order: rows are all together



        |     |
	-------
        | 0,0 |
	-------
        | 0,1 |
	-------
        | 1,0 |
	-------
        | 1,1 |
	-------
        | 2,0 |
	-------
        | 2,1 |
	-------
        | 3,0 |
	-------
        | 3,1 |
	-------
        |     |

column major order: columns are all together


        |     |
	-------
        | 0,0 | --
	-------   |
        | 1,0 |   |
	-------   |--- one column
        | 2,0 |   |
	-------   |
        | 3,0 | --
	-------
        | 0,1 |
	-------
        | 1,1 |
	-------
        | 2,1 |
	-------
        | 3,1 |
	-------
        |     |

Here is a formula for calculating the address of an element of a 2-D array.

Row Major:

 addr. of [y, x] =  base +    offset to      +       offset within
			     correct row                 row
                                 |                        |
                                 |                        |
		  (size)(y - first_row) (# columns)       |
                                                          |
						(size) (x - first_col)
Column Major:
 addr. of [y, x] =  base +    offset to      +       offset within
			     correct column             column
                                 |                        |
                                 |                        |
		  (size)(x - first_col) (# rows)          |
                                                          |
						(size) (y - first_row)
Need to know:
  1. row/column major (storage order)
  2. base address
  3. size of elements
  4. dimensions of the array

And, like for 1-dimensional array address calculation, if indices always begin their numbering with 0, then the formula is a bit simpler.

Bounds Checking

Many HLL's offer some form of bounds checking. Your program crashes, or you get an error message if an array index is out of bounds.

       /* Pascal example */
       x:  array[1..6] of integer;

       . . .code. . .

       y := x[8];        /* ERROR! ACCESS OUT OF BOUNDS. */

Assembly languages offer no implied bounds checking. After all, if your program calculates an address of an element, and then loads that element (by the use of the address), there is no checking to see that the address calculated was actually within the array!

A short example (to motivate some thought as to how to do bounds checking):

What is the address of element[1, 4] ? (assume row major ordering)

A program probably just plugs the numbers into the formula:

     addr of [1, 4] = base + 1(1)(3) + 1(4)
		    = base + 7

This actually gives the address of element [2, 1], still a valid element of the array, but not what was really required. . .

       0  1  2 (3 4)
     ----------
   0 |  |  |  |
     ----------
   1 |  |  |  |  |X|
     ----------
   2 |  |  |  |
     ----------
   3 |  |  |  |
     ----------
   4 |  |  |  |
     ----------

Stacks

We often need a data structure that stores data in the reverse order that it is used. Along with this is the concept that the data is not known until the program is executed (run time). A stack allows both properties.

Abstractly, here is a stack. Analogy to stack of dishes. Also called Last In First Out, LIFO.

       |       |      |       |      |       |
       |-------|      |-------|      |-------|
       |       |      |       |      |       |
       |-------|      |-------|      |-------|
       |       |      |       |      |       |
       |-------|      |-------|      |-------|
       |       |      |       |      |   Y   |
       |-------|      |-------|      |-------|
       |       |      |   X   |      |   X   |
       |-------|      |-------|      |-------|

                      (after 1       (after 2
		        PUSH)          PUSHES)

Data put into the stack is said to be pushed onto the stack. Data taken out of the stack is said to be popped off the stack. These are the 2 operations defined for a stack.

Here is an example, showing the algorithm for printing out a positive integer, character by character (integer to character string conversion).

      integer = 1024 

      if integer == 0 then
	 push '0'
      else
         while integer <> 0
            digit <- integer mod base
            char <- digit + 48
	    push char onto stack
            integer <- integer div base
      
      while stack is not empty
	 pop char
	 put char

Implementation of a stack from an array

Need to know: address of the top of stack (tos), often called a stack pointer or simply sp.

  (initial state)
     sp
      |
     \ /
    -----------------------------
    |   |   |   |   |   |   |   |
    -----------------------------

sp is a variable that contains the address of the empty location (next available) at the top of the stack. In assembly language code, the stack pointer variable is always kept in a register, for efficiency in loads and stores.

Assume that this example stack contains word-sized elements.

For an array declared (in MAL) as

       stack:  .word  0:50

OR

       stack:  .space  200   # there are 4 bytes per word, so 50*4=200

The stack pointer is to contain the address of the next available location. Therefore, one logical way to intialize the stack pointer loads the address of the first element of the stack. If the stack pointer is to reside in register $8,

       la  $8, stack       # initialization of stack pointer

Here is a PUSH operation:

      sw    $21, ($8)    # $21 is used as the data to be pushed
      add   $8, $8, 4

Here is an alternative coding of the same PUSH operation. The stack pointer is moved first, conceptually allocating the memory space before using the space for the data pushed.

      add   $8, $8, 4
      sw    $21, -4($8)  # $21 is used as the data to be pushed

Here is a POP operation:

      sub   $8, $8, 4
      lw    $21, ($8)     # the data popped off the stack goes into $21

And, here is the alternative coding of the same POP operation, conceptually copying out the data before deallocation of the space.

      lw    $21, -4($8)   # the data popped off the stack goes into $21
      sub   $8, $8, 4

A stack could instead be implemented such that the stack pointer points to a full location at the top of the stack.

  (initial state)
  sp
  |
 \ /
    -----------------------------
    |   |   |   |   |   |   |   |
    -----------------------------

Here is a PUSH operation:

      add   $8, $8, 4   # sp is in $8
      sw    $12, ($8)   # assume data to be pushed is in $12

Here is a POP operation:

      lw    $12, ($8)   # assume data to be popped goes into $12
      sub   $8, $8, 4

Another alternative:
The stack could "grow" from the end of the array's memory allocation towards the beginning. (Note that which end of the array the stack grows toward is independent of what sp points to.)

For the student to figure out:
How do you know when the stack is empty?
How do you know when the stack is full?

Queues

Whereas a stack is LIFO, a queue is FIFO (First In, First Out).

A real life analogy is a line (called a queue in British English). A person gets on the end of the line (the TAIL), waits, and gets off at the front of the line (the HEAD).

Getting into the queue is an operation called enqueue. taking something off the queue is an operation called dequeue.

It takes 2 pointers to keep track of the data structure, the head and the tail.

An example where head=tail implies an empty queue, the head points to a full location, and the tail points to an empty location.

  initial state:
  --------------------------------------------
      |     |     |     |     |      |
  --------------------------------------------
              ^
              |
	      head, and tail
  after 1 enqueue operation:
  --------------------------------------------
      |     |  x  |     |     |      |
  --------------------------------------------
               ^     ^
               |     |
               |     tail
	       head
  after another enqueue operation:
  --------------------------------------------
      |     |  x  |  y  |     |      |
  --------------------------------------------
               ^           ^
               |           |
               |           tail
	       head
  after a dequeue operation:
  --------------------------------------------
      |     |  x  |  y  |     |      |
  --------------------------------------------
                     ^     ^
                     |     |
                     |     tail
	             head

Note that (like stacks) when an item is removed from the data structure, it is physically still present, but correct use of the structure does not access it.

If enough items are enqueued (and possibly dequeued) from the queue, the pointer will eventually run off the end of the array! This leads to implementations that "wrap" the beginning of the array to the end, and forms a circular queue. This re-uses the space allocated for the queue.

The implementation of the circular queue is a bit more complex. The conditions to test for an empty queue and full queue are more difficult. They can be eased by implementing a queue with one element that is called a dummy. It is never used for data storage.

This is an example of the space vs. time trade-off. An extra piece of memory is used in an inefficient manner, in order to make the test (code) for full/empty queues more efficient.


Copyright © Karen Miller, 2009