# Data Structures

A common theme in programming:
space is memory space
time is time to execute program

It is often possible to write a program such that it
1. executes very fast, but wastes/utilizes more memory
or
2. utilizes little memory, but executes slower (as compared to option 1).

Data structures can make memory usage efficient or inefficient. and they can cause the algorithms required for operation to be more or less efficient.

The data structures we will discuss: arrays, stacks, and queues.

## Arrays

Array implementation is important because

1. most assembly languages have no concept of arrays
2. from an array, any other data structure we might want can be built

Properties of arrays:

1. each element is the same size (char = 1 byte, integer = 1 word)
2. elements are stored contiguously, with the first element stored at the smallest memory address (called the base address)

So, the whole trick in assembly language is

1. to allocate the correct amount of space for an array
2. an address tells the location of an element

memory can be thought of as an array

• it is a giant array of bits or bytes or words
```       -------   MEMORY
0 |     |
-------
1 |     |
-------
2 |     |
-------
3 |     |
-------
4 |     |
-------
5 |     |
-------
```
• the element numbering (an index) starts at 0
• the element number (an index) is an address

### MAL declarations of arrays within memory

To allocate a portion of memory (more than a single variable's worth) (allocating an array within memory):

```     variablename:  type     initvalue:numelements
```
• type is just like before: `.byte`, `.word` or `.float`
• `numelements` is just that, a number of elements
• index numbering always starts at 0
• `initvalue` is an initial value given to each element of the array

new directive:

```  name:  .space    numberofbytes
```

`.space` is a way of allocating space (bytes) within memory, but not give them an initial value. Note: the type of the data within this space cannot be inferred.

Examples:

```      arrayname: .byte  0:8
```

This gives 8 character-sized elements, numbered 0 - 7, initialized to 0, which is the null character.

```      name: .space  18
```

This gives 18 bytes of memory (with no implied initial contents).

An example of how to calculate the address of an element:

```  byte (character) elements --
array1:  array[6..12] of char;   /* PASCAL */

6   7   8   9  10  11  12    <---- element index
-----------------------------
|   |   |   |   |XXX|   |   |
|   |   |   |   |XXX|   |   |
-----------------------------
25  26  27  28  29  30  31    <---- address

want the 5th element,

byte address of array1[10] =   25 + (10 - 6)
=   29
```

This same example (or close) only in MAL:

```       array1:  .byte  0:7

0   1   2   3   4   5   6    <---- element index
-----------------------------
|   |   |   |   |XXX|   |   |
|   |   |   |   |XXX|   |   |
-----------------------------
25  26  27  28  29  30  31    <---- address

want the 5th element,

array1[4] is at address     array1 + 4
If element 0 is at address 25,
byte address of array1[4] =   25 + 4
```

How do you get the address array1?
Answer: the MAL `la` (load address) instruction.

This is the equivalent to the C code:

```    px = &x;
```

In MAL:

```     la   \$8, x   # \$8 holds an address
```

Or, for this particular array,

```     la   \$14, array1   # \$14 has ADDRESS of first element of array1
```

Note that which register is used to hold the address is implementation dependent, and is not relevant for this example.

This is where it is extremely important to understand and keep clear the difference between an address and the contents at an address.

To reference array1[4] (the 5th element) in MAL, write the code,

```     la   \$14, array1   # \$14 has ADDRESS of first element of array1

# then, if  we wanted to place the character 'Q' there,
li   \$15, 'Q'
sb   \$15, 4(\$14)
```

On to word (integer-sized) elements.

```      array2:  array[0..5] of integer;   /* PASCAL declaration */

int  array2[6];   /* C declaration */

array2:  .word 0:6       #  MAL

array2:  .space 24       # alternative declaration in MAL

0   1   2   3   4   5      <-- implied index
-------------------------
| 0 | 0 | 0 | 0 | 0 | 0 |
-------------------------
80  84  88  92  96  100      <-- memory address

byte address of array2[3] =  80 + 4(3 - 0)
=  92
```

To reference array2[3] (the 4th element) in MAL, write the code,

```    la  \$8, array2
add \$9, \$8, 12     # 3*4=12, \$9 has address of desired array element
# then, if  we wanted to subtract 1 from the value there
lw   \$10, (\$9)
sub  \$10, \$10, 1
sw   \$10, (\$9)
```

After this code fragment executes, the array contents are

```        0   1   2   3   4   5      <-- implied index
-------------------------
| 0 | 0 | 0 |-1 | 0 | 0 |
-------------------------
80  84  88  92  96  100      <-- memory address
```

In general, we need to know

1. where the array starts (the base address)
2. size of an element in bytes (to get a byte address)
3. what the first element is numbered (most students only have experience with high level languages that start numbering elements at 0)
```     byte address of element[x] = base + size(x - first index)
```

If indices are always numbered starting from 0, then there is one fewer arithmetic operation needed in computing the address of an element.

An example with a code fragment that deals with an array.

Suppose we had a 50 element array of integers. We want to initialize the array elements such that each element is the additive inverse of its index.

A diagram of what we want the code to do:

```    0    1    2    3             48    49    <---- index
--------------------         -------------
| 0 | -1 | -2 | -3 |  . . . .| -48 | -49 |
--------------------         -------------
```
```   # MAL code fragment to initialize elements of the array
.data
array:  .word  0:50   # an array of 50 integers
#  could have declared this as
#    array:  .space 200

# register usage:
# \$13 -- array index and loop induction variable
# \$14 -- additive inverse of array index
# \$12 -- address of element i; initialized to base address of array
# \$11 -- the constant 50

.text

li   \$11, 50
la   \$12, array
li   \$13, 0
for:    beq  \$13, \$11, end_forloop   # iterate 50 times
sub  \$14, \$0, \$13
sw   \$14, (\$12)              # place value into array
add  \$12, \$12, 4             # address changed by 4: 4 bytes per word
add  \$13, \$13, 1             # increment loop induction variable
b    for
end_forloop:
```

Note: This code fragment only shows the relevant part of the program of the example. There is no `__start` label, because we do not care about the beginning of the program. There is no `done` instruction, because the program is not supposed to exit (complete) due to the operation of this code fragment.

### 2 Dimensional Arrays

There are more issues for 2 dimensions than for 1-dimensional arrays.

First, how to map a 2-dimensional array onto a 1-dimensional memory?

Terminology:

```	  r x c array -- r rows
c columns

element[y, x] -- y is row number
x is column number

example:     4 x 2 array

0     1   <(column index)
-------------
0 |     |     |
-------------
1 |     |     |
-------------
2 |     |  X  |               X is element [2,1]
-------------
3 |     |     |
^ -------------
(row index)
```

Mapping this 4 x 2 array into memory. There are 2 possiblilities.

row major order: rows are all together

```

|     |
-------
| 0,0 |
-------
| 0,1 |
-------
| 1,0 |
-------
| 1,1 |
-------
| 2,0 |
-------
| 2,1 |
-------
| 3,0 |
-------
| 3,1 |
-------
|     |
```

column major order: columns are all together

```
|     |
-------
| 0,0 | --
-------   |
| 1,0 |   |
-------   |--- one column
| 2,0 |   |
-------   |
| 3,0 | --
-------
| 0,1 |
-------
| 1,1 |
-------
| 2,1 |
-------
| 3,1 |
-------
|     |
```

Here is a formula for calculating the address of an element of a 2-D array.

Row Major:

``` addr. of [y, x] =  base +    offset to      +       offset within
correct row                 row
|                        |
|                        |
(size)(y - first_row) (# columns)       |
|
(size) (x - first_col)
```
Column Major:
``` addr. of [y, x] =  base +    offset to      +       offset within
correct column             column
|                        |
|                        |
(size)(x - first_col) (# rows)          |
|
(size) (y - first_row)
```
Need to know:
1. row/column major (storage order)
3. size of elements
4. dimensions of the array

And, like for 1-dimensional array address calculation, if indices always begin their numbering with 0, then the formula is a bit simpler.

HINTS toward getting this correct:
• Draw pictures.
• Do not forget to account for size.

### Bounds Checking

Many HLL's offer some form of bounds checking. Your program crashes, or you get an error message if an array index is out of bounds.

```       /* Pascal example */
x:  array[1..6] of integer;

. . .code. . .

y := x[8];        /* ERROR! ACCESS OUT OF BOUNDS. */

```

Assembly languages offer no implied bounds checking. After all, if your program calculates an address of an element, and then loads that element (by the use of the address), there is no checking to see that the address calculated was actually within the array!

A short example (to motivate some thought as to how to do bounds checking):

given:
• a 5 x 3 array
• byte size elements
• row major order
• first_row = 0
• first_col = 0

What is the address of element[1, 4] ? (assume row major ordering)

A program probably just plugs the numbers into the formula:

```     addr of [1, 4] = base + 1(1)(3) + 1(4)
= base + 7
```

This actually gives the address of element [2, 1], still a valid element of the array, but not what was really required. . .

```       0  1  2 (3 4)
----------
0 |  |  |  |
----------
1 |  |  |  |  |X|
----------
2 |  |  |  |
----------
3 |  |  |  |
----------
4 |  |  |  |
----------

```

## Stacks

We often need a data structure that stores data in the reverse order that it is used. Along with this is the concept that the data is not known until the program is executed (run time). A stack allows both properties.

Abstractly, here is a stack. Analogy to stack of dishes. Also called Last In First Out, LIFO.

```       |       |      |       |      |       |
|-------|      |-------|      |-------|
|       |      |       |      |       |
|-------|      |-------|      |-------|
|       |      |       |      |       |
|-------|      |-------|      |-------|
|       |      |       |      |   Y   |
|-------|      |-------|      |-------|
|       |      |   X   |      |   X   |
|-------|      |-------|      |-------|

(after 1       (after 2
PUSH)          PUSHES)
```

Data put into the stack is said to be pushed onto the stack. Data taken out of the stack is said to be popped off the stack. These are the 2 operations defined for a stack.

Here is an example, showing the algorithm for printing out a positive integer, character by character (integer to character string conversion).

```      integer = 1024

if integer == 0 then
push '0'
else
while integer <> 0
digit <- integer mod base
char <- digit + 48
push char onto stack
integer <- integer div base

while stack is not empty
pop char
put char
```

### Implementation of a stack from an array

Need to know: address of the top of stack (tos), often called a stack pointer or simply sp.

```  (initial state)
sp
|
\ /
-----------------------------
|   |   |   |   |   |   |   |
-----------------------------
```

`sp` is a variable that contains the address of the empty location (next available) at the top of the stack. In assembly language code, the stack pointer variable is always kept in a register, for efficiency in loads and stores.

Assume that this example stack contains word-sized elements.

For an array declared (in MAL) as

```       stack:  .word  0:50
```

OR

```       stack:  .space  200   # there are 4 bytes per word, so 50*4=200
```

The stack pointer is to contain the address of the next available location. Therefore, one logical way to intialize the stack pointer loads the address of the first element of the stack. If the stack pointer is to reside in register \$8,

```       la  \$8, stack       # initialization of stack pointer
```

Here is a PUSH operation:

```      sw    \$21, (\$8)    # \$21 is used as the data to be pushed
```

Here is an alternative coding of the same PUSH operation. The stack pointer is moved first, conceptually allocating the memory space before using the space for the data pushed.

```      add   \$8, \$8, 4
sw    \$21, -4(\$8)  # \$21 is used as the data to be pushed
```

Here is a POP operation:

```      sub   \$8, \$8, 4
lw    \$21, (\$8)     # the data popped off the stack goes into \$21
```

And, here is the alternative coding of the same POP operation, conceptually copying out the data before deallocation of the space.

```      lw    \$21, -4(\$8)   # the data popped off the stack goes into \$21
sub   \$8, \$8, 4
```

A stack could instead be implemented such that the stack pointer points to a full location at the top of the stack.

```  (initial state)
sp
|
\ /
-----------------------------
|   |   |   |   |   |   |   |
-----------------------------
```

Here is a PUSH operation:

```      add   \$8, \$8, 4   # sp is in \$8
sw    \$12, (\$8)   # assume data to be pushed is in \$12
```

Here is a POP operation:

```      lw    \$12, (\$8)   # assume data to be popped goes into \$12
sub   \$8, \$8, 4
```

Another alternative:
The stack could "grow" from the end of the array's memory allocation towards the beginning. (Note that which end of the array the stack grows toward is independent of what sp points to.)

For the student to figure out:
How do you know when the stack is empty?
How do you know when the stack is full?

## Queues

Whereas a stack is LIFO, a queue is FIFO (First In, First Out).

A real life analogy is a line (called a queue in British English). A person gets on the end of the line (the TAIL), waits, and gets off at the front of the line (the HEAD).

Getting into the queue is an operation called enqueue. taking something off the queue is an operation called dequeue.

It takes 2 pointers to keep track of the data structure, the head and the tail.

An example where head=tail implies an empty queue, the head points to a full location, and the tail points to an empty location.

```  initial state:
--------------------------------------------
|     |     |     |     |      |
--------------------------------------------
^
|
```
```  after 1 enqueue operation:
--------------------------------------------
|     |  x  |     |     |      |
--------------------------------------------
^     ^
|     |
|     tail
```
```  after another enqueue operation:
--------------------------------------------
|     |  x  |  y  |     |      |
--------------------------------------------
^           ^
|           |
|           tail
```
```  after a dequeue operation:
--------------------------------------------
|     |  x  |  y  |     |      |
--------------------------------------------
^     ^
|     |
|     tail