Chapter 8 -- data structures



DATA STRUCTURES
---------------

common theme in programming:
  SPACE vs. TIME tradeoff
       space is memory space
       time is time to execute program

  it is often possible to write a program such that it 
    1. executes very fast, but wastes/utilizes more memory
     or
    2. utilizes little memory, but executes quite slow

  data structures can make memory useage efficient or inefficient


data structures we will discuss:
  arrays, stacks, and queues  (in that order)

ARRAYS
------

array implementation is important
 1. most assembly languages have no concept of arrays
 2. from an array, any other data structure we might want can be built

Properties of arrays:
 1. each element is the same size (char = 1 byte, integer = 4 bytes)
 2. elements are stored contiguously, with the first element
    stored at the smallest memory address

so, the whole trick in assembly language is
  1.  allocate correct amount of space for an array
  2.  an ADDRESS tells the location of an element
  3.  USE address to get at array element






memory can be thought of as an array
    it is a giant array of bits or bytes or words (picture needed here!)
    the element numbering starts at 0
    the element number is an address

    if each byte has a unique address, we have BYTE ADDRESSING.
    (the Pentium has this, as do many machines)

    if each doubleword has a unique address, we have WORD ADDRESSING.



SASM access of the memory array

    we can access any element (byte) of memory

    m(address)    refers to the element (contents) at address
    m(20)   is the byte or doubleword numbered address 20,
	    the 21st byte of memory.
	  
	  important:  20 is the address
		      m(20) is the contents at address 20


    

SASM declarations of arrays within memory

    to allocate a portion of memory (more than a single variable's worth)
      (allocating an array within memory)


     variablename  type     numelements dup(initvalue)


       type is just like before --   db or dd
       numelements is just that, and numbering ALWAYS starts at 0
       initvalue is a value given to each element of the array


   example:

      my_array db 8 dup(0)
	 8 character elements, numbered 0 - 7, initialized to 0

an example of how to calculate the address of an element:

  byte (character) elements --
       array1:  array[6..12] of char;   /* PASCAL */


        6   7   8   9  10  11  12    <---- element index
      -----------------------------
      |   |   |   |   |XXX|   |   |
      |   |   |   |   |XXX|   |   |
      -----------------------------
       25  26  27  28  29  30  31    <---- address

	byte address of array1[10] =   25 + (10 - 6)
                       	           =   29




this same example (or close) only in SASM:
       array1  db  7  dup(0)

        0   1   2   3   4   5   6
      -----------------------------
      |   |   |   |   |XXX|   |   |
      |   |   |   |   |XXX|   |   |
      -----------------------------
       25  26  27  28  29  30  31    <---- address

       want the 5th element,

	    array1[4] is at address     array1 + 4
	    If element[0] is at address 25,
	byte address of array1[4] =   25 + 4


how do you get the address array1?
     SASM la (load address) instruction

     la   ar_addr, array1

     takes the address array1 (25 in the example) and puts it
     into the variable called ar_addr.

     this is where it is extremely important to understand and keep
     clear the difference between an address and the contents of
     an address.

     to reference array1[4] in SASM, write the code,

	la  my_addr, array1
	iadd my_addr, 4

	; then if  we wanted to decrement element number 4
	sub m(my_addr), 1


	remember,   my_addr is an address (declared as dd)
		    m(my_addr) is the byte at the address my_addr



  integer elements --
      array2:  array[0..5] of integer;  /* PASCAL */

      in SASM,
      array2  dd  6 dup(0)


        0   1   2   3   4   5      <-- index
      -------------------------
      |   |   |   |XXX|   |   |
      |   |   |   |XXX|   |   |
      -------------------------
       80  84  88  92  96  100      <-- memory address

      byte address of array2[3] =  80 + 4(3 - 0)
				=  92


      SO, we need to know
	1.  where the array starts (called base address)
	2.  size of an element in bytes (to get a byte address)
	3.  what the first element is numbered


     byte address of element[x] = base + size(x - first index)


     Note:  if we always start the index numbering at 0, then
            this formula becomes

     byte address of element[x] = base + size(x)




2 DIMENSIONAL ARRAYS
--------------------

There are more issues here, than for 1 dimensional arrays.

First, how to map a 2 dimensional array onto a 1 dimensional memory?

	  TERMINOLOGY:

	  r x c array -- r rows
			 c columns
	  
	  element[y, x] -- y is row number
			   x is column number


  example:     4 x 2 array



mapping this 4 x 2 array into memory.
2 possiblilities:
    

row major order:

    rows are all together


        |     |
	-------
        | 0,0 |
	-------
        | 0,1 |
	-------
        | 1,0 |
	-------
        | 1,1 |
	-------
        | 2,0 |
	-------
        | 2,1 |
	-------
        | 3,0 |
	-------
        | 3,1 |
	-------
        |     |

column major order:
     
     columns are all together

        |     |
	-------
        | 0,0 |
	-------
        | 1,0 |
	-------
        | 2,0 |
	-------
        | 3,0 |
	-------
        | 0,1 |
	-------
        | 1,1 |
	-------
        | 2,1 |
	-------
        | 3,1 |
	-------
        |     |



2-D Arrays
----------

The goal: come up with a formula for calculating the address of
          an element of a 2-D array.



Row Major:





 addr. of [y, x] =  base +    offset to      +       offset within
			     correct row                 row

		  (size)(y - first_row) (# columns)

						(size) (x - first_col)



Column Major:




 addr. of [y, x] =  base +    offset to      +       offset within
			     correct column             column

		  (size)(x - first_col) (# rows)

						(size) (y - first_row)


Need to know:
   1. row/column major (storage order)
   2. base address
   3. size of elements
   4. dimensions of the array


HINTS toward getting this correct:
 Draw pictures.
 Don't forget to account for size.




BOUNDS CHECKING
---------------

Many HLL's offer some form of bounds checking.  Your program crashes,
or you get an error message if an array index is out of bounds
   cal example:      x:  array[1..6] of integer;
		 code. . .

		     y := x[8];


Assembly languages offer no implied bounds checking.
After all,  if your program calculates an address of an element,
and then loads that element (by the use of the address), there is
no checking to see that the address calculated was actually within
the array!

example (to motivate some thought as to how to do bounds checking):

   given --      a 5 x 3 array
		 byte size elements
		 row major order
		 first_row = 1
		 first_col = 1

	  what is the address of element[2, 5]

     a program probably just plugs the numbers into the formula:

     addr of [2, 5] = base + 1(2 - 1)(3) + 1(5)
		    = base + 8
	

     this actually gives the address of element [3, 3],
     still a valid element of the array, but not what was really
     required. . .






STACKS
------

We often need a data structure that stores data in the reverse
order that it is used.  Along with this is the concept that the
data is not known until the program is executed (RUN TIME).
A STACK allows both properties.

Abstractly, here is a stack.  Analogy to stack of dishes.
Also dubbed Last In First Out, LIFO.

       |       |
       |       |
       |-------|
       |       |
       |-------|
       |       |
       |-------|
       |       |
       |-------|
       |       |
        ------- 


Data put into the stack is said to be PUSHED onto the stack.
Data taken out of the stack is said to be POPPED off the stack.

Algorithm Example:
      printing out a positive integer, character by character
      (integer to character string conversion)

      integer = 1024 

      if integer == 0 then
	 push '0'
      else
         while integer <> 0
            digit <- integer mod base
            char <- digit + 48
	    push char onto stack
            integer <- integer div base
      
      while stack is not empty
	 pop char
	 put char



IMPLEMENTATION OF A STACK
-------------------------
One implementation of a stack out of an array.

Need to know:
    index of TOP OF STACK (tos), often called a stack pointer (sp).
    In most implementations, tos is not an index at all, it is
    an ADDRESS (pointer).


  (initial state)
     sp
      |
     \ /
    -----------------------------
    |   |   |   |   |   |   |   |
    -----------------------------


   sp is a variable that contains the address of the empty location
   at the top of the stack.


for an array declared (in SASM) as

       my_stack  dd  50 dup(0)
       my_sp     dd  ?

  OR (could be, since amount of space is important)
       my_stack  db  200
       my_sp     dd  ?  ; doesn't change in these declarations



a PUSH operation:
      
      move    m(my_sp), data
      iadd    my_sp, 4

       
a POP operation:

      isub    my_sp, 4
      move    data, M(my_sp)




A stack could instead be implemented such that the stack pointer
points to a FULL location at the top of the stack.

  (initial state)
  sp
  |
 \ /
    -----------------------------
    |   |   |   |   |   |   |   |
    -----------------------------


a PUSH operation:
      
      iadd    my_sp, 4
      move    M(my_sp), data

       
a POP operation:

      move    data, M(my_sp)
      isub    my_sp, 4


ANOTHER ALTERNATIVE:
  The stack could "grow" from the end of the array towards
  the beginning.  (Note that which end of the array the stack
  grows toward is independent of what stack pointer points to.)



For the student to figure out:
  How do you initialize the stack pointer?
  How do you know when there are no more items in the stack?
  How do you know when the stack is full?



QUEUES
-------

whereas a stack is LIFO,
	a queue is FIFO (First In, First Out).

a real life analogy is a line (called a queue in British English).
  Person gets on the end of the line (the TAIL),
    waits,
      and gets off at the front of the line (the HEAD).

getting into the queue is an operation called ENQUEUE.
taking something off the queue is an operation called DEQUEUE.

it takes 2 pointers to keep track of the data structure,
  the head and the tail.


  initial state:
  --------------------------------------------
      |     |     |     |     |      |
  --------------------------------------------
              ^
              |
	      head, and tail


  after 1 enqueue operation:
  --------------------------------------------
      |     |  x  |     |     |      |
  --------------------------------------------
               ^     ^
               |     |
               |     tail
	       head

  after another enqueue operation:
  --------------------------------------------
      |     |  x  |  y  |     |      |
  --------------------------------------------
               ^           ^
               |           |
               |           tail
	       head

  after a dequeue operation:
  --------------------------------------------
      |     |  x  |  y  |     |      |
  --------------------------------------------
                     ^     ^
                     |     |
                     |     tail
	             head



Note that (like stacks) when an item is removed from
the data structure, it is physically still present,
but correct use of the structure cannot access it.



If enough items are enqueued (and possibly dequeued) from
the queue, the points will eventually run off the end of the
array!  This leads to implementations that "wrap" the
beginning of the array to the end, and forms a CIRCULAR QUEUE.

The implementation of the circular queue is a bit more complex.
  The conditions to test for an empty queue and full queue
  are more difficult.  They can be eased by implementing a
  queue with one element that is a DUMMY.  It is never used
  for data storage. The DUMMY element works its way around the
  memory allocated for the queue as items are enqueued and dequeued.

  This is an example of the space vs. time trade-off.
  An extra piece of memory is used in an inefficient
  manner, in order to make the test for full/empty queues
  more efficient.