*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 6
Part 1

1.  There are a few differences between floating point addition/subtraction
and integer addition/subtraction.

The most obvious is that integers are (usually) stored in 2's compliment
format.  This allows for efficient signed operations and only 1 zero.

IEEE floating point numbers are stored in sign magnitude format.  They
might need to be conversioned to 2's compliment notation or run through
a subtractive ALU.

Another big difference is that the radix points must be aligned for
floating point addition and subtraction.  All integers by definition are
aligned.

2.  All normalized floating point numbers can be written as 
[sign]1.[rest of mantissa]  times 2 to the [exponent] power.  The digit to 
the left of the radix point is the hidden bit.  It is "hidden" because a 
normalized binary number will always be in the form 1.[mantissa]. If 
they kept the same precision (24 bits including the hidden bit), but explicitly
noted the hidden bit then they'd have to drop one of the exponent bits. 
Forgetting the hidden bit will give the wrong result
most of the time.

Take the binary number 10 (decimal 2).  The normalized representation in
IEEE floating point is 0100 0000 0000 0000 0000 0000 0000 0000.  Sign is 
positive, exponent is 1 (0x80-0x7f = 0x01), and mantissa is (1).000...
1 times 2 to the 1st power is 2.  Without the hidden bit, the value would
be 0 times 2 to the 1st power which is 0.

After the error has been made then multiplication would yield incorrect
results (0 times any number is 0, but that isn't true for 2), addition
would be wrong too (0 is additive identity: 0 + n = n, but that again
isn't true for 2), division could end up with NaN results (due to
division by 0), and subtraction results suffer the same as addition.

All in all the hidden bit is essential to correct IEEE floating point
math.

3.  When adding two numbers of the same sign, the sign bit will be unchanged.  
When adding two numbers of different signs the sign bit of the result will be
equal to the sign bit of the number with the highest absolute value.  
Subtraction is the sme as addition provided you take the additive inverse of
the subtrahend.

When multiplying two numbers of the sign sign, the sign of the product is
positive.  When multiplying two numbers of different signs, the sign of the
product is negative.  The sign of a quotient in division follows the same
rules as products in multiplication.

4.  The 4 steps in multiplication are: 1. do unsigned multiplication on the
mantissas.  2.  add the exponents.  3. normalize the result.  4. set the
sign bit of the result.

Example  10(base 10)*(-100.25)(base 10).  In base 2, this is 1.010 times 2 to 
the third power * -1.10010001 times 2 to the 6th power.

add bias 127 (0x7f) to both exponents

0x7f + 0x03 = 0x82

0x7f + 0x06 = 0x85

Both numbers written in IEEE format are 0100 0001 0010 0000 0000 0000 0000 0000
                                        SEEE EEEE ESSS SSSS SSSS SSSS SSSS SSSS

and 1100 0010 1100 1000 1000 0000 0000 0000
    SEEE EEEE ESSS SSSS SSSS SSSS SSSS SSSS

1st multiply the mantissas


     1.10010001
    *     1.010
    -----------
   110 0100 01
 11001 0001
---------------
1.1111 0101 010

2nd add the exponents

 0x82
+0x85
-----
0x107

Subtract the bias
   0f1
 0x107
- 0x7f
------
  0x88 (decimal 136 which when bias is subtracted is 9 which is 3 + 6)

3rd normalize the result

1.111 1010 1010 times 2 to the 9th power is already normalized

4th set the sign bit

positve times negative is negative

product is 1100 0100 0111 1010 1010 0000 0000 0000
           SEEE EEEE ESSS SSSS SSSS SSSS SSSS SSSS

which is hexidecimal is 0xc47aa000

5.  Floating point addition requires the lining up of radix points
between the two numbers.  Floating point multiplication doesn't
require this.

6.

The 4 steps in IEEE division are 1. Do unsigned division on the mantissas.
2. Subtract the exponent of the divisor from the exponent of the dividend.
3. Normalize the result
4. Set the sign bit of the result

Example

-4 (base 10) / -16 (base 10)

-4 (base 10) in binary is -1.00 times 2 to the second power
-16 (base 10) in binary is -1.000 times 2 to the fourth power

add bias (0x7f) to both powers

 0x02
+0x7f
-----
 0x81

 0x04
+0x7f
-----
 0x83

-4  in IEEE is 1100 0000 1000 0000 0000 0000 0000 0000
               SEEE EEEE ESSS SSSS SSSS SSSS SSSS SSSS

-16 in IEEE is 1100 0001 1000 0000 0000 0000 0000 0000
               SEEE EEEE ESSS SSSS SSSS SSSS SSSS SSSS

1st unsigned division of the mantissas
      1.000
     ______
1.000)1.000

2nd subtract the exponent of divisor from exponent of dividend

 0x81    0x83
-0x83   -0x81
-----   -----
-0x02    0x02

-0x02
+0x7f (add bias back in)
-----
 0x7d

3rd normalize result

1.000 is already normalized

4th set up sign bit of the result

negative divided by negative is positive

result is 0011 1110 1000 0000 0000 0000 0000 0000
          SEEE EEEE ESSS SSSS SSSS SSSS SSSS SSSS

in hexidecimal 0x3e800000

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Ch.6 Part2

-------------------------------------------------------------------------------
Question 1:
Give examples illustrating the difference between the four rounding strategies.
-------------------------------------------------------------------------------
   
   The four rounding strategies:
       - Round towards zero             (rtz)
          * effect: trunkation, value closer to zero
       - Round towards pos. infinity    (rtpi)
          * effect: always larger than actual number
       - Round towards neg. infinity    (rtni)
          * effect: always smaller than actual number
       - Round towards nearest          (rnear)
          * effect: if val to rt of desired dec. is less than 1/2 - trunkate
                    if val to rt of des. dec. is >= 1/2, round to greater mag.
       
       Given number: 5.7814 - round to 3 dec.
        rtz:   5.781
        rtpi:  5.782
        rtni:  5.781
        rnear: 5.781
        
      Given number: -5.7814 - round to 3 dec.
        rtz:   -5.781
        rtpi:  -5.781
        rtni:  -5.782
        rnear: -5.781
        
      Given number: 1.28 - round to 1 dec.
        rtz:   1.2
        rtpi:  1.3
        rtni:  1.2
        rnear: 1.3
        
      Given number: -1.28 - round to 1 dec.
        rtz:   -1.2
        rtpi:  -1.2
        rtni:  -1.3
        rnear: -1.3
   
-------------------------------------------------------------------------------
Question 2:
Which rounding strategy is the best/worst and why?
-------------------------------------------------------------------------------
   
   That depends on what the numbers mean. If you are dealing with something
   such as how much paint you need to paint a room, the best rounding
   strategy is to round to positive infinity. If you are designing an item
   to fit inside another item, the best option would be rounding to zero.
   In the case of rounding money, people get upset when they get charged 
   extra, but at the same time, the company cannot always get less money.
   Therefore, the best option in the case of money, round to nearest would be
   the most appropriate, because, on average, you round up as often as you
   round down.
   
-------------------------------------------------------------------------------
Question 3:
When could overflow occur when performing additive operations?
-------------------------------------------------------------------------------
   
   Overflow occurs when the exponent of a normalized result is outside the 
   range of values that can be represented. In other words, if you have very
   large IEEE numbers (both exponents maxed out) and the addition of the
   mantissas causes the need for normalization, overflow has occured in   
   floating point.
   
-------------------------------------------------------------------------------
Question 4:
When could underflow occur when performing additive operations?
-------------------------------------------------------------------------------
   
   Underflow occurs when a result is too close to zero to be represented.
   This can be cause when adding a very small positive number (on verge of
   being zero) to a very large negative number (also on verge of being zero).
   The result will be zero if they are the same magnitude, but if not, the 
   result can be so close to zero that when rounding is applied, zero is the 
   result, even though the actual value could be more or less than zero.
   
-------------------------------------------------------------------------------
Question 5:
Give the bit patterns for two IEEE single-precision floating point numbers
whose addtion would result in a denormalized number.
-------------------------------------------------------------------------------
   
   An example of this is question 2 from HW 6:
   
   (The numbers with the asterisks after are the IEEE bit patterns)
   
  0xfe7a0004     1 11111100 11110100000000000000100   ****
+ 0xfefa0007   + 1 11111101 11110100000000000000111   ****
------------   ------------------------------------

1 11111100 1.11110100000000000000100 --> 1 11111101 0.111110100000000000000100
 
  1 11111101  0.111110100000000000000100
+ 1 11111101  1.11110100000000000000111
-----------------------------------------
  1 11111101 10.111011100000000000010010
  
  Normalize: 1 11111110 1.0111011100000000000010010
  
  Final IEEE: 1 11111110 01110111000000000000101
  Final hex:  0xff3b8005
   
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 9

Part1:


1. Describe the difference between jal and jr.

 jal procedure_name:  
  Jump and Link. Stores the address of the following instruction into $ra, then jumps to the
  address specified by the label 'procedure_name'.

 jr: Jump Register. This instruction is an unconditional jump to the address contained in the 
  register specified.


2. Give an example using jal.

  jal  proc1 #stores return address into $ra, and jumps to proc1
  ...  #the address of this line is stored into $ra
  ...

 proc1:   #code here
  jr $ra #return to the address previously stored in $ra
 

3. Outline the general format of activation records (stack frames).

 Note: Space for the maximum number of parameters used is allocated.

    Small Address
 
 |---------------|
        |  |<--$sp
 |---------------|
 |      $a0 |     --
 |---------------| |
 |      $a1 | |
 |---------------| |
 |      $a2 | |
 |---------------| |
  |      $a3 | |
 |---------------| |
 |      $sR | |
 |---------------| | 
 |      $sR | |
 |---------------| |-- Activation Record allocated & deallocated by procA
 |  more saved | |
 |  registers | |
 |---------------| | 
 |      $ra | |
 |---------------| |
 |      $tR | |
 |---------------| |
 |      $tR | | 
 |---------------| |
 |  more temp | |
 |  registers | |
 |---------------| |
 |  local vars. |     --
 |---------------|
 |      $a0 |     --
 |---------------| |
 |      $a1 | |
 |---------------| |
  |      ...  |  |-- Activation Record allocated & deallocated by procA's parent
 |---------------| |
 |      ...  | |
 |---------------| |
 |  | |

    Large Address


4. Describe MIPS procedure for passing parameters to/from procedures.

 Parameters to a procedure:
  -The parameters are placed int0 registers $a0 - $a3 ($4 - $7) before calling the procedure.
   The first parameter is placed into $a0, the second into $a1, and so forth.
  -If more parameters are needed, they are placed onto the stack.

 Parameters from a procedure (Return values):
  -The return values are placed into registers $v0 & $v1 before returning to calling procedure.
   The first return value is placed into $v0, the second into $v1.
  -If space for more return values is needed, they are placed onto the stack.
  

5. What is the purpose for procedures in a high-level language? Is there a point to having a procedure mechanism
   in assembly language?

 Procedures allow the program to be modularized, which means that code can easily be:
  -reused
  -modified
  -and different programmers can work on different parts of the same program.

 For the reasons above, it makes sense to have procedures in assembly language as well.


6. Explain the difference for passing parameters using registers, the stack or some in the register and some in
   the stack. What are the pros and cons?

 -Only Registers:
  Pros: -Easy to implement and fast (because no memory access)
  Cons: -There is a limited number of registers to use
   -Does not work for nested procedure calls and recursion (values would be overwritten)

 -Only Stack:
  Pros: -Easy to implement
  Cons: -Slow, since there are lots of memory accesses

 -Some in Registers & some in the stack:
  Pros: -Since there are 4 or less parameters most of the time, we can take advantage of method
    1 mentioned above.
  Cons: -Lots of data movement btw. registers and memory required.


7. Outline what is needed in the activation record.

 An Activation Record consists of all the information that corresponds to the state of a procedure, such
 as:
  -return address
  -parameters
  -return values
  -any live temporary registers
  -all saved registers (if used)
  -see also number 3

 The Activation Record gets allocated before a jumping to another procedure and deallocated upon return
 from the called procedure.

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 9, Part 2

8)Chapter 9 online notes provides an excellent and detailed example 

# procedure: procA
# function: demonstrate CS354 calling convention
# input parameters: $a0 and $a1
# output (return value): $v0
# saved registers: $s0, $s1
# temporary registers: $t0, $t1
# local variables: 5 integers named R, S, T, U, V
# procA calls procB with 5 parameters (R, S, T, U, V).
#
# Stack frame layout:
#       
#       | in $a1  |  68($sp)
#       | in $a0  |  64($sp)
#       |---------|
#       |    V    |  60($sp)       --|
#       |    U    |  56($sp)         |
#       |    T    |  52($sp)         |
#       |    S    |  48($sp)         |
#       |    R    |  44($sp)         |
#       |   $t1   |  40($sp)         |
#       |   $t0   |  36($sp)         | --  A's activation record
#       |   $ra   |  32($sp)         |
#       |   $s1   |  28($sp)         |
#       |   $s0   |  24($sp)         |
#       | out arg4|  20($sp)         |
#       | out $a3 |  16($sp)         |
#       | out $a2 |  12($sp)         |
#       | out $a1 |   8($sp)         |
#       | out $a0 |   4($sp)       --|
#       |---------|
#       |         | <-- $sp        ---
#       |         |                  |
#       |         |                  |   --  where B's activation record
#       |         |                  |        will be
procA:
        # procedure prologue
        sub $sp, $sp, 60        #allocate activation record, includes
                                # space for maximum outgoing args
        sw $ra, 32($sp)         #save return address
        sw $s0, 24($sp)         # save 'saved' registers to stack
        sw $s1, 28($sp)         # save 'saved' registers to stack
        # end prologue

        ....    # more code

        # call setup for call to procB
        # save current (live) parameters into the space specifically
        # allocated for this purpose within caller's stack frame
        sw $a0, 64($sp)         # only needed if values are 'live'
        sw $a1, 68($sp)         # only need if values are 'live'
        # save any registers that need to be preserved across the call
        sw $t0, 36($sp)         # only need if values are 'live'
        sw $t1, 40($sp)         # only need if values are 'live'
        # put parameters into proper location
        lw $a0, 44($sp)         # load R into $a0
        lw $a1, 48($sp)         # load S into $a1
        lw $a2, 52($sp)         # load T into $a2
        lw $a3, 56($sp)         # load U into $a3
        lw $t0, 60($sp)         # load V into a temp register
        sw $t0, 20($sp)         # outgoing arg4 must go on the stack
        #end call setup

        # procedure call
        jal procB

        # return cleanup for call to procB
        # restore saved registers
        lw $a0, 64($sp)
        lw $a1, 68($sp)
        lw $t0, 36($sp)
        lw $t1, 40($sp)
        # return values are in $v0 and $v1

        ....    # more code

        # procedure epilogue
        # restore return address
        lw $ra, 32($sp)
        # restore $s registers saved in prologue
        lw $s0, 24($sp)         
        lw $s1, 28($sp)        
        # put return values in $v0 and $v1
        mov $v0, $t0
        # deallocate stack frame
        add $sp, $sp, 60
        # return
        jr $ra

# end of procA

save parameter, s registers, t registers and anything else

9)    #main

      move   $a0, $s0   #passing three params
      move   $a1, $s1
      move   $a2, $s2
      jal addthree      #calling procedure
      .
      .
      .
     exit:      done


addthree:       sub     $sp, $sp, 32    #create AR of standard size

                move $t0, $a0           #moving params to temp registers
                move $t1, $a1
                move $t2, $a2

                add $t4, $t0, $t1       #adding params
                add $t4, $t4, $t2

                add $sp, $sp, 32        #deallocate AR

                move $v0, $t4           #returning value
                jr   $ra


10)  The V registers $v0-$v1/$2-$3 - expression evaluation/return values
     The A registers $a0-$a3/$4-$7 - passing params
     The T registers $t0-$t9/$8-$15 & $24-$25 - temporary registers
     stack pointer - $sp/$29
     return address - $ra/$31


11)  We are allowed to use T registers in procedures and in the main body of the program
     We can also use S registers procedures and in the main program.  We can use V registers
     in procedures.  We can use $sp and $ra in procedures.

12)  Frame pointers are pointer that are used in the procedures frame, instead of using 
     the stack pointer. Frame pointers exist in MAL and TAL, but not in SAL

13)  callee saved is when register values are saved across procedure calls an 

     procA:       sw $t0, ($sp)    #store on frame
                  sw $t1, 4($sp)
                  
                  .
                  .
                  .#use register's t1 and t0
                  .
                  .
                  lw $t0,  ($sp)           #load back values at the end of procA
                  lw $t1, 4($sp)
                  #deallocate stack frame
                  jr $ra

    caller saved is when register values are not preserved across procedure calls

    #here is the basic concept - the calling procedure saves values it wants to preserve
    
    proc A:      sw $t0, ($sp)     #store on frame
                 sw $t1, 4($sp)

                 jal procB

                 lw $t0,  ($sp)    #load back values at the end of procA
                 lw $t1, 4($sp)

Q7)  Give examples illustrating the difference between the four rounding 
     strategies.

A7)  1) Round toward ZERO
        -----------------
      
        Decimal Examples:
        0.938927      if 4 decimal places available, 0.9389
                      if 1 decimal place available, 0.9

        Binary Examples:
         1.000010     if 2 binary places available,  1.00
        -0.111100     if 2 binary places available, -0.11
         0.110111     if 3 binary places available,  0.110
        -1.011100     if 3 binary places available, -1.010
 
     2) Round toward positive INFINITY
        ------------------------------
        
        Decimal Examples:
        2.65          if 1 decimal place available,  2.7
       -5.79          if 1 decimal place available, -5.7

        Binary Examples:
        0.0101        if 2 binary places available, 0.10
       -0.0010        if 2 binary places available, 0.00

     3) Round toward negative INFINITY
        ------------------------------

        Decimal Examples:
        2.65          if 1 decimal place available,  2.6
       -5.79          if 1 decimal place available, -5.8

        Binary Examples:
        0.0101        if 2 binary places available,  0.01
       -0.0010        if 2 binary places available, -0.01


     4) Round toward NEAREST
        --------------------
     
        Decimal Examples:
         
        2.55          if 1 decimal place,  2.6
       -1.79          if 1 decimal place, -1.8
       -1.75          if 1 decimal place, -1.8

        Binary Examples:
       -1.0101        if 3 binary places available, -1.010
       -1.1001        if 3 binary places available, -1.100
 
        0.0011        if 2 binary places available,  0.01
        0.1101        if 2 binary places available,  0.11


Q8)  Which rounding strategy is the best/worst and why?

A8)  If you are looking at it through the accuracy point then the best 
     rounding stratergy would be rounding to the NEAREST because it gives you 
     the best accuracy, meaning the most accurate results, usually.  It 
     sometimes might not give you the best accuracy but overall it is the 
     rounding methond which will give the best accuracy. Because of this, 
     errors are not compounded and they often cancel out.

     The worst rounding stratergy would be rounding to ZERO because it is the 
     least accurate most of the time.  But we must note that rounding to ZERO
     does give you really good precision.  For example if you have number
     .7789 and if you have 3 decimal places available, then using rounding
     to the NEAREST you would get .779 which is pretty accurate, but rounding
     to ZERO would give you .778 which is not that accurate.

Q9)  When could overflow occur when performing additive operations?

A9)  Overflow could occur when the exponent of the normalized result is outside
     the range of values representable.  So if you get a exponent after the
     additive operation to be greater than 128 in decimal and for example if 
     you got the biased exponent E of a single-precision normalized result to be
     1 0000 1110(base 2) then overflow has occured. 

Q10) When could underflow occur when performing additive operations.

A10) Underflow occurs when a result is too close to zero to be represented.
     For example, repetitively dividing a number by a positive constant will 
     result in a an answer getting close and closer to zero but never actually
     reaching zero.  But using floting point operations, the value will 
     eventually "underflow" after some iterations, because it will return 0.

Q11) Give the bit patterns  for two IEEE single-precision floating point numbers
     whos addition would result in a denormalized number.

A11) If you need to increase the range of representable number you would need
     to give up precision gradually. Then the very small number are spread 
     farther apart than they otherwise would, and this could be accomplished by
     denormalizing a number.

     where E = 0       m = 0 + (F/(2^(23)))
     and the value represented by using denormalization is
      
               (-1)^S * 2^(-126) * (F/(2^(23))) = (-1)^S * 2^(-149) * F
        
     0 0000 0000 (1)000 0000 0000 0000 0000 1111   
    +1 0000 0000 (1)000 0000 0000 0000 0000 1110
     -------------------------------------------
    
     0 0000 0000 (0)000 0000 0000 0000 0000 0001

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 10 Part 1

1. C0 is a coprocessor for the kernel.  It provides additional registers, including ones that can 
   communicate with I/O and interrupts.
   C1 is a coprocessor that handles floating point numbers.  Without it, floating point numbers
   would be omitted.

2. The compiler and assembler translate high-level user code into low-level machine code.  They
   also, in theory, streamline the code so that it runs as efficiently as possible.

3. TAL is the instruction set that the machine uses.  MAL is an abstraction of TAL.  It is the 
   instruction set that the assembler recognizes.  MAL is translated into TAL.  SAL is an abstraction
   of MAL and TAL.  It is much easier to program in than MAL or TAL for someone who is used to high-
   level languages.  We learned about these three languages to give us a taste of what lies beneath 
   high-level languages and to learn somewhat about the actual process of the machine.

4. "add" adds two registers and stores them into a third.  "addi" adds an immediate value and a register.
   Adding immediates is faster because they do not have to be loaded from memory like registers.

5. "la" - you must know the hex address of the label you wish to load, then lui the top half and ori the 
       bottom into the register.
   "mul" - you must mult the two operands, then mflo the product register.
   "div" - you still use "div", but with only the divisor and dividend.  The quotient is then loaded with
       "mflo" to the desired register.
   "rem" - do "div" as above, only get the remainder with "mfhi" to the desired register.
   "move" - just "add" $0 to the move-from register and store the result in the move-to register.
   "blt","bgt","ble","bge" - subtract the control register (the second one) from the test register
       (the first one) and store the result in a temporary.  Then use "bltz","bgtz","blez", or "bgez" with
       the temporary as your conditional.
   "li" - just "addi" the immediate to $0 and store the result in the load-to register.
   "not" - use "nor" with 0x00000000

6. sub $8,$3,$18
   bgtz $8,br_label
   ***
   sub $8,$18,$3
   bltz $8,br_label

7. blt R,S,label  -->  subu $1,R,S
                       bltz $1,label
   ***
   bgt R,S,label  -->  subu $1,R,S
                       bgtz $1,label

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 10, part 2

1. What is the purpose for syscall in TAL? How and when is it used?
----------------------------------------------------------------------------
 
The purpose for syscall in TAL is to make input/output requests to operating system/kernel,
so operating system/kernel can handle those requests accordingly. It is used when we need to
interact with users in any way.  For examples,In Mal, instructions of putc,getc,puts
done will all be translated to codes involving syscall when we translate to TAL.
and how it is used is to first specify what kind of input and output requests we want. So
We first load $2 with some certain value that will indicate what kind of input/output instruction 
we want to do.

so depending on what we have in $2,

$2 value     Function
--------     --------
1	     put
4	     puts
5	     get
10	     done
11	     putc
12	     getc

Then depending on what you want to do, syscall can occur differenlty.

for example, if you want to putc $18,
you first need to load the constant value in $2 with the values indicated above
to specify which kind of function we want to do,
so in our case, it is putc, so we load $2 with 11. Then after that,
we put the value we want to put on the screen into $4, then make a syscall.
so it will be something like this

putc $18

li $2,11
move $4,$18
syscall

so right after syscall, it will go to the exception handler,and the exception handler will
direct to the code that invloes syscall, and syscall code will read the $2, and see what kind of
input/ouput function have been requested, then in our case, it reads 11 in $2, so it will now
direct to the part of code for putc, and use the value in $4 for putting whatever you want on the screen.


2 Using three registers, give a code example that multiples two numbers and prints out their product, 
  then divides two numbers and prints out their remainder in TAL.
  -----------------------------------------------------------------


Assuming $t1,$t2 already have the 2 numbers that we want to muliply or want to get the remainder


...................more code.....................

  
	mult $t2,$t3
	mflo $t1
	jal print_int


	div $t2,$t3
	mfhi $t1
	jal print_int

# Then go to a print_int method that will print out the product and the remainder in $t1


print_int:    addi $t0,$0,10		# get the paramters into temp var's 
	                  		# and load up the counter
	      addi $t3,$0,0	

loop:	       rem $t4,$t1,$t0		# start up by dividing the integer value from the input
	       add $t4,$t4,48		# by 10, and the remainder for this least signifiant digit 
	       sub $sp,$sp,4		# will add to 48 and store in the stack, and keep divide the  
	        sw $t4,4($sp)		# integer by 10,and when it hits zero quotient, it will stop
	       add $t3,$t3,1		# looping,also a counter was added 1 each time to keep
	       div $t1,$t1,$t0		# track how many integer value have looped throught already.	
              bnez $t1,loop	

printing:	lw $t4,4($sp)		
	       add $sp,$sp,4                           		
	        li $2,11
	      move $4,$t4	
	   syscall
	       sub $t3,$t3,1		
	      bnez $t3,printing		
	        jr $ra


3. Translate the following MAL code segment into TAL (if necessary) and machine code.
   The starting address for the instructions is 0x0008 8800
   --------------------------------------------------------------------------------
 

and $5, $6, $18  ==> 0x0008 8800

beq $5, $0, br1  ==> 0x0008 8804

lui $20, 0x66aa	 ==> 0x0008 8808

br1:      lb $9, -8($20)   ==> 0x0008 880c 


they are already in TAL, so no need to translate.


and $5,$6,$18 ==> 0000 0000 1101 0010 0010 1000 0010 0100
		  0x 00d22824


beq $5,$0,br1 ,need to calculate the offset first 
so 0x0008 880c -(0x0008 8804 + 0x0000 0004) = 0x0000 0004
which is 0000 0000 0000 0000 0000 0000 0000 0100
and we take out the last 2 bits since they are always 0
so the 16 constant bits are 0000 0000 0000 00001
then combine with the other parts of the instruction
0001 0000 1010 0000 0000 0000 0000 0001
which is 
0x10a00001


lui $20,0x66aa
0011 1100 0001 0100 0110 0110 1010 1010
which is
0x3c1466aa


lb $9, -8($20)
1000 0010 1000 1001 1111 1111 1111 1000
which is
0x8289fff8


4.  Add comments to the TAL program on page 271 on page 267.
    --------------------------------------------------------

# TAL program to sume the squares of the first 20 integers.
# register assignment:
#   $0 -- always 0
#   $1 -- used for synthesis of MAL instruction
#   $8 -- squares
#   $9 -- condition
#   $10 -  numintegers
#   $11 -  temp
#   $12 -  count

    .data
numintegers: .word 20

    .text

__start: #------------------------------------------------------------------------------------------------
	 addi $12,$0,1		# $12 = 1, used as counter,initialize to be 1 
	 #----------------------------------------------------------------------------------------------
	 addi $8,$0,0		# $8 = 0, initialize this variable to hold up the current sum of all the squares
	 #-------------------------------------------------------------------------------------------
	 lui  $10, 0x0000	# $10 = 0x0000 3000, address of the numintegers
	 ori  $10,$10,0x3000	
	 #----------------------------------------------------------------------------------------------
	 lw   $10, 0($10)	# load word from the address store in $10 which is numintegers in our case
				# and numintegers is initialized as 20,so $10 has value of 20
	 #-----------------------------------------------------------------------------------------------
	 
while:   #------------------------------------------------------------------------------------------
	 sub  $9,$12,$10	# This loop will first mulitply itself starting from 1, and then 
	bgtz  $9,end		# add up the current square value to the previous cumulative sqaure value
	mult $12,$12		# so $11 is the current square value of $12 which acts as a counter startring from 1 
        mflo $11		# add $8 holds up the current sum of the square values, so each time $11 is obtained
	 add $8,$8,$11		# it will be added with $8 and store to $8 again. Then counter increases by 1 and branch
	addi $12,$12,1		# back to the while loop, the ending condition is when the difference of the counter and $10
           j while		# which is 20 in our case becomes positive.  That means we have to branch to label end  when the counters
				# hit above 20.
	 #--------------------------------------------------------------------------   

end:	#-------------------------------------------------
	addi $2,$0,10		# done, quit the program
	syscall
	#--------------------------------------------------


5. Describe the processes for calculating the branch offset illustrated by a small example.
   ------------------------------------------------------------------------------------------
.data

..............some code...................

.text


.............some code.....................


begin:		and $5, $6, $18
		sub $5,$5,$7
		beq $5,$9,begin


next:		add $7,$7,$3


............some code..............


To calculate the branch offset of this small example, we first see if we 
are given the address of begin, for example, if let's say begin instruction
has an address of 0x0040 0008, and since each additional will add 4,
by the time it reaches beq $5,$9,begin, the instruction address for that instruction is 0x0040 0010.
Then,to calculate the offset, we need to add 4 to 0x0040 0010 because when PC reaches this instruciton
,it will update the PC to the nxt instruction. So we need to add 4 to 0x0040 0010 which becomes
0x0040 0014, then since begin has instruction address of 0x0040 0008, we take the difference of these two
numbers, so 0x0040 0014 - 0x0040 0008 = which is 0x0000 000c. And depending on we are going backwrad
or forward, we will need to make the sign correct, so in our case we are going back to begin, so we
need a minus sign on the difference, so 

0000 0000 0000 0000 0000 0000 0000 1100 
will need to be 2's complemented to make it to be negative, so it will become
1111 1111 1111 1111 1111 1111 1111 0100

Then we always take out the the 2 least significant digits of this number, because we are
always by the mutiples of 4, so last 2 digist will always be 0. so after taking out 
last 2 digits, we take 16 digits from the right to fill in the instruction code as our branch offset.
so in our case after taking out the last 2 digits, it will become

1111 1111 1111 1111 1111 1111 1111 01

and taking 16 digits from right, we have

1111 1111 1111 1101 => This is our final answer for the branch offset value.


6.   What should the assembler do if the calculated branch offset 
     is too large to fit into the offset field of an instruction?
     ---------------------------------------------------------------

     we can use other branch or jump instructions to make it work.

     for exmaple:
     
		bgtz $5,jumping
		
     can be translated as
	 
		blez $5, continue
		   j jumping

      continue: ..............more code........


jump works because this instruction can hold up more than 16 digits of I while
the branch bgtz $5, jumping can only hold up exactly 16 digits of I in the instruction
code. 

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 11
Part 1:

1. Defind vocab terms.


User interface devices
The user interface is the aggregate of means by which people (the users) interact with
a particular machine, device, computer program or other complex tool (the system). 
The user interface provides means of:

Input, allowing the users to control the system 
Output, allowing the system to inform the users (also referred to as feedback)

Mass storage devices
typical examples of mass storage devices are disks and magnetic drives

Gateways and networks
computers are networked together, and they communicate with other computers

mapped
characters are mapped to keys, that is a table can be used to describe for each key 
what character or sequence
of characters gets sent to the computer

control-A
the ASCII character 'soh'

alternate key, meta key, command key
computer keyboards often have special keys that modify the effect of other keys.  each
key effectively doubles the number of possible designated sequences that can be sent to
the computer

events
some keyboards simply notify the computer each time any key is presses, and each time 
any key is released

baud rate
the rate at which individual bits are snt serially to the computer

start bit
a keyboards way of indicating to the computer when it has a character to send

stop bit
a bit to indicate the keyboard is done sending a character

glass teletype
a device that imitates a typewriter, or teletypewriter

tab stop
many typewriters are capable of backing up one space or skipping forward to a 
designated column

ASCII terminal
a keyboard and display that transmit ASCII characters

cursor
the position where the next character will appear 

scolling
action of shifting all the lines up by one line

HOME
a character that moves the cursor to the upper left hand corner of the screen

smart terminals
terminals with many enhancements

controller
respnsible for receiving the characters from the computer and updating the screen
appropriately

pixels
individual dots on the screen

bit-mapped displays
display capable of manipulating pixels, rather than simply printing the ASCII characters
at predetermined points on the screen

echoing
the computer sends to the display the same ASCII character received from the keyboard

hard disk
and I/O device whose function is to increase the capacity of the memory system.

block of memory
a large block of memory is written to a hard disk at a time

hard disk surface
the sides of a platter

read/write arm
one for each surface of hard disk

read/write head
each read/write arm has one to write to the surface of a platter

spindle
a cylinder that all the platters of a hard disk are physically connected to

tracks
data on  a platter is organized into tracks, each is a concentric circle

cylinder
all the tracks in the same concentic location on all the platters

sectors
sections of data in each track

seeks
read/write arm moves to proper positon

memory-mapped I/O
hardware is designed so that a region of memory space is not really memory at all
but rather a collecton of comunication channels to I/O devices

unavailable
an importand way in which I/O devices are different from memory is that they may
be unavailable

ready
an input device has a new character to trasmit

not ready
an input device does not have a new character to transmit

busy
an output device is not ready to accept another character

status
information such as ready, not ready, busy

command
a command often takes the form of some additional iniformation to the I/O device,
such as the kind of operation it whould perform

read-only 
the processor can read the data, but cannot write the data

write-only
the processor can send data to the device, but it cannot read data

programmed I/O
processor is programmed to handle each byte that arrives from the input device, 
and every byte to be sent to and output device

spin-waiting
processor is tied up for long periods spin-waiting for devices to become ready
when it could be performing other useful operations

controller
a simple computer that executes a very simple program

channel
a simple computer that executes a very simple program

Direct Memory Access
provides CPU with a way to avoid low-level control of th transfer, initiating
the transfer of the entire blaock, then testing periodically to see if the 
transter has cpmpleted


2. Why is spin-waiting a bad idea for input/output programming?
the processor is tied up for long periods of time when it could often be performing
other useful operations


3. What would happen on a computer system that used a spin-wait loop to print a 
character if a print jammed?
it would just continuosly wait until the printer was ready and do nothing else


4. Is asynchronous I/O a better solution than spin-waiting for very fast I/O devices?
Why or why not?
This solution allows the programmer to use the CPU while the I/O decices are busy, 
even if a large number of characters must be input or output.  The drawbacks are that
there will be a period after the ouput device has become ready until the next call to
put a character is made during which the I/O device could be in use but is not.  Also
the program must remember to call putnextchar periodically, or the queue will never 
get printed.  Failure to call putnextchar for an extended period could result in 
degraded performance, but it should be recoverable.


5. Does an assembly language programmer have any control over what type of I/O
implementations are offered?
Yes the programmer can use differnt methods such as prgrammed I/O, spin-waiting,
or asynchronous I/O.


6. Sal get instruction...
Problems that might arrise are that the computer wont do anything else until a charater
is typed.  I could be doing other things.  Possible ways to avoid this problem would be
to have the CPU continue to do other things while it is waiting or this character input,
or perhaps it could give the user a time limit for entering the character.


7.  Mal code to implement put for a variable declared as .byte

waitloop:	lb	$14, DisplayStatus	# get display status character
		bgez	$14, WaitLoop		# loop until device is available
		sw	$2, DisplayData		# write character to display

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

      Chapter 11, Part 2

1.   
getByte:	getc  $t0
		sw   $t0, byte
		jr   $ra


2.
getWord:	getc $t0
		sub  $t0, $t0, 48
		
		bltz $t0, gotten
		bgt  $t0, 9, gotten
		
		mul  $t1, $t1, 10
		add $t1, $t1, $t0

		b   getWord

gotten:		jr  $ra


3.   .data
eMask:		0x7f800000	#mask for exponent
sMask:		0x80000000	#mask for significand
fMask:		0x00000001	#mask for fraction
     .text
putFloat:	move $t0, aFloat
		
		and  $t8, $t0, sMask	#Extract and print
		ror  $t8, $t0, 1	#sign of float
		mul  $t8, $t8, 45	#
		putc $t8		#
		
		and  $t1, $t0, eMask	#retrieve exponent
		sub  $t1, $t1, 127
		blez $t1, frac		#decide whether float is fractional

		li   $t8, 1
		ror  $t0, $t0, 23
buildInt:	bltz $t1, fIntBuilt
 		      sll  $t2, $t2, 1
		      or   $t2, $t2, $t8
		      rol  $t0, $t0, 1
		      and  $t8, $t0, fMask
		      add  $t7, $t7, 1
		b    buildInt


fIntBuilt:	add  $sp, $sp, -4
		sw   $ra, 4($sp)
		
		li   $a0, 10
		move $a1, $t2
		
		jal  print_integer

		li   $t2, 46
		putc $t2

		move $t2, $0
		li   $t6, 2
		li   $t5, 10
		b    fBuildFrac


frac:		li   $t3, 48
		putc $t3
		li   $t2, 46
		putc $t2

		li   $t6, 1
		li   $t5, 1

		add  $t1, $t1, 1
zeroes:		bgez $t1, zeroesPrinted
		     putc $t3
		     add  $t1, $t1, 1
		     
		     mul  $t5, $t5, 10
		     mul  $t6, $t6, 2
		b    zeroes
zeroesPrinted:	b    fBuildFrac


fBuildFrac:	bgt  $t7, 23, fFracBuilt
		     and  $t8, $t0, fMask
		     mul  $t8, $t8, $t5
		     div  $t8, $t8, $t6
		     mul  $t2, $t2, 10
		     add  $t2, $t2, $t8

		     mul  $t5, $t5, 10
		     mul  $t6, $t6, 2
		     add  $t7, $t7, 1
		b    fBuildFrac


fFracBuilt:	jal  print_integer

		lw   $ra, 4($sp)
		add  $sp, $sp, 4
	jr   $ra


#Procedure print_integer
# Seperates the integer by case and prints it.
# $t0 stores the integer
# $t1 stores the base
# $t2 stores the highest power of the base times the base
# $t3 holds the most recently removed place of the integer
# $t4 stores 1 for comparison
# $t5 stores the character value for printing
# $counts the number of times a number is divisible by its base
#Input:
#	$a0 - the base of the integer
#	$a1 - the integer itself
#Output:
#	None
print_integer:		move $t0, $a1
			move $t1, $a0


			move $t3, $t1
countPlaces:			div  $t2, $t0, $t3			#Counts the highest place of the integer
				beqz $t2, countPlacesExit		#
				mul  $t3, $t3, $t1			#
			b    countPlaces

	
countPlacesExit:	move $t2, $t3
			li   $t4, 1
			
printIntWhile:		beq  $t2, $t4, printIntExit			#Prints the integer
				div  $t2, $t2, $t1			#place-by-place
				div  $t3, $t0, $t2			#
				add  $t5, $t3, 48			#
				putc $t5				#
				mul  $t3, $t3, $t2			#
				sub  $t0, $t0, $t3			#
			b    printIntWhile
printIntExit:		   
	jr   $ra


4.   
putString:	move $t0, $a0
		lb   $t1, ($t0)
increStr:	beqz $t1, strPrinted
		     putc $t1
		     add  $t0, $t0, 1
		     lb   $t1, ($t0)
		b    increstr
	jr   $ra


5.   
     .data
prompt:		"Enter a character: "
echoMsg:	"Character's echo: "
     .text
__start:	puts prompt
		getc $t0
		puts echoMsg

		li   $t1, 92
		beq  $t0, 8, backspace
		beq  $t0, 9, tab
		beq  $t0, 10, newline
		b    putReturn

backspace:	putc $t1
		li   $t0, 98
		b    putReturn

tab:		putc $t1
		li   $t0, 116
		b    putReturn

newline:	putc $t1
		li   $t0, 110
		b    putReturn


putReturn:	putc $t0
	done


6.   DMA allows the CPU to avoid performing long, tedious data access tasks.  The CPU can instead pass
     the job of performing a data access off to a controller.


7.   
.
.
.
Initialize:
		lw   $15, Count

		la   $16, Begin_file

		add  $15, $15, $16		

WaitLoop:
		lw   $14, DiskSTtus
		bgez $14, WaitLoop

		lb   $2, DiskData
		sb   $2, ($16)

		add  $16, $16, 1
		blt  $16, $15, WaitLoop

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 12 

1. When and why is synchronization important? 
the computer accomodates external events by synchronization at the following
events:
A program need synchronization at events resulting from I/O and events from another progr
am execution

2. For example of I/O, wrong character might be read or printed if synchroniza
-tion fails.   The solution is:
communicating program should update a data structure before other program obser
-ves any changes.


3. interrupts are initiated outside the instruction stream asynchronously
   traps occur due to something in instruction stream synchronously

4. the hardware temporarily suspends the user program, and instead 
runs code called an EXCEPTION HANDLER.  After the handler
is finished doing whatever it needs to, the hardware returns control 
to the user program.
a. sets state giving cause of exception 
changes to kernel mode, saving the previous mode
disables further interrupts
saves current PC
jumps to hardwired address 0x8000 0080 where the exception handler code is

5. 
#assume the variables are declared as .word
HandleSys:
        li  $4, 15
        beq $2, $4, Clock

Clock:
        lw      $t0, _k_time #load the saved time
        move    $2, $t0
        sw      $2, _k_save_v0 #save the time
        j       Return

HandleInt:
        jal Clock_handler

Clock_handler:
        lw      $t0, ClockStatus #load the clock status
        bgez    $t0, _k_clock_handler #wait until clock status is
                                      #available                
        lw      $t5, _k_time #load time and update it
        add     $t5, $t5, 1     

DP_handler:
                #call clock handler not to miss the second updating
                #in case the interrupt is for the keyboard 
                move $t5, $ra   #save return address
                sw   $t5, _k_save_t5
        
                jal  _k_clock_handler #call clock handler
        
                lw   $t5, _k_save_t5 #restore return address
                move $ra, $t5   
        
KB_handler:
                #call clock handler not to miss the second updating
                #in case the interrupt is for the keyboard 
                move $t5, $ra   #save return address
                sw   $t5, _k_save_t5
        
                jal  _k_clock_handler #call clock handler
        
                lw   $t5, _k_save_t5 #restore return address
                move $ra, $t5   

6.  #assume variable is declared as .word
    mfc0 $t0, $14 # get Exception Program Counter
    sw $t0, save_epc # save EPC before handling another exception

7. the chance is 1 / 1000 = 1 micro / i milli
   no, because an interrupt occurs outside of
 instruction stream.

8. OS is important since it's a program that allocates
and controls the use of all system resources as
memory, processor, I/O, etc.

9. kernel handles all kinds of exceptions in MIPS

10. 
   JumpTable:  .word case0
               .word case1
               .word case2
 
    
    sll  $8, $8, 2          # case number shifted left 2 bits
                            # (need a word offset into table, not byte)
    lw   $9, JumpTable($8)  # load address into $9
    jr   $9                 # jump to address contained in $9

    .
    .
    .

 case0:   #code for case0 here
    .
    .
    .
 case1:   #code for case1 here
    .

11. to jump to the right handler according to exception
jump table is A clever mechanism for doing something
 like a CASE (SWITCH) statement.
A jump to one of many locations. In the above example,
the cases don't have to go in any specific order.

12.interrupt mask allows the enabling of individual interrupts
 if the interrupts are currently enabled, the 8 bits
of the status register can be written to control individually 
the two software and six hardware interrupt requests.

13. no, impossible. the processor won't execute the user mode program.

14. no, the register can't be restored since it would be containing the second
interrupt handler's registers.

15. the queue for the keyboard data in kernel is full and the kernel signals
the user that more character input causes an error.

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 13 Part 1

1.  
A SLL can also be performed by multiping the number by 2. Multiplication by two can be simulated by adding the number to 
itself.

$t0 contains the bit to be shifted
# these two instructions perform this: $t0 += $t0 OR $t0 *= 2 OR $t0 = $t0 - (-$t0)
sub $t1, $0, $t0    # t1 = - t0
sub $t0, $t0, $t1   # t0 = t0 - (-t1)


2.  The property of subtraction that makes in inherently more powerful than addition is it's additive inverse of one operand. 
Subtraction can be thought of a combination of a additive inverse operation and a addition operation. 

3. 

Longer But Faster

lw $t0, 10000
lw $t1, 1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1
sra $t0, $t0, $t1


Shorter But Slower

lw $t0, 10000
lw $t1, 10

loop:

blt $t1, $0, done
div $t0, $t0, 2
sub $t1, $t1, 2
b loop

done:

In both cases the number is divided by two 10 times. The first code example will run faster since there are no control dependices and that sra is a faster instruction than div.

4.
A scenario in which the compile time of the program is an important measure of cost is a program that must be re-complied 
often and quickly. For example a large program that is complied for many different architectures every day that is used for testing purposes it might be worthwhile to have the program be somewhat larger and somewhat slower if the compile time is significantly reduced. 

5.

An instruction that changes nothing is an ideal no-op instruction: add $0, $0, $0

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 13, Part B

6)  The data dependancy occurs because register $18 is having a value
    stored into it in the first instruction, and the value is then 
    being used in the next instruction as an operand.
    This would cause a bubble in a pipelined machine, as the second
    instrcution would have to wait for the result of the first before
    continuing to execute.
    
7)  In a pipeline, bubbles could be needed in cases where there could
    be either a data or control dependancy. Such as when an operand of
    an instruction is preceeded by an instruction that modifies that
    needed value.

8)  The number of pipeline stages remains constant, its an integral
    part of the implementation of the architecure - so the number of
    conditional branch instructions in a program does not influence
    the number of pipeline stages.
     -- 
    If the question asked about the conditional branches versus the 
    pipeline stalls, obviously more conditional branches means there
    are more stalls - the processor has to stall the pipeline until 
    it knows what instruction is next (assuming no prediction).

9)  The absolute minimum number of instructions necessary for a
    computer that has I/O, could be two (2).
    Multiplication can be synthesized by the use of addition and
    subtraction, and addition can be performed by a series of
    just subtractions.  A single control instruction would be 
    need, such as a branch if equal to zero.  All I/O would be
    performed via dedicated registers.  If I/O were to be preformed
    memory-mapped, load and store instructions would be needed as
    well, for a total of four (4).  This all assumes a three
    operand instruction (one destination, two sources), additional
    instructions would be needed for two or single operand 
    instruction sets.

10) If the instruction 'addi $14, $14, 1' is addded after the
    endofloop label, the code will execute properly for each case of
    the branch.  This code is expected to perform better than before
    when the branch is not taken - when the brach is taken an extra
    instruction is executed (the additional addi)
    

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Chapter 14, Part A

    1.      What defines a computer's architecture?
        Speed, cost/price, useability, and intended market all define how a computer architecture is designed.
 
2.      What architectural features correspond to a RISC architecture? A CISC architecture? 
        RISC (Reduced Instruction Set Computer)         CISC(Complex Instruction Set Computer)
        a. load/store architecture                              a. complex instructions
        b. very few addressing modes                    b. large instruction set
        c. simple instructions                          c. many addressing modes
        d. pipelined implementation
        e. small instruction set- easily decoded
        f. fixed size instructions
 
3.      Give an example of a machine for both a RISC or CISC architecture. 
        RISC - IBM R/S 6000 (R/S = "RISC System") and the VAX
        CISC - Motorola 680x0 family and the Intel 80186 
4.      A computer designed for a special application may have very special instructions 
that are explicitly designed for that application. Carrying this to an extreme, it is 
possible to build a computer that has only a single instruction: do_it. Identify an 
application where such an instruction might be appropriate. 
      The only information I could find on a single instruction computer was:
        Patterson and Hennessy, they describe a Single Instruction Computer (SIC). The 
only instruction this can perform is a    Subtract-And-Branch-If-Negative (sbn) written as:
         sbn a,b,c
        where the content of memory location b is subtracted from the contents of memory 
location a, the result put in location a        and then the instruction branches to 
location c if the result is less than 0 (else it continues).
 
5.      What is the importance of the integrated circuit? What is the difference between 
and integrated circuit and a microprocessor? 
        An integrated circuit is important because the time it takes for an electical 
signal to cross a chip is faster than going between several chips. The more gates that can 
fit on an integrated circuit increase speed.  The difference between integrated circuits 
and microprocessors is that microprocessors contain the entire CPU.
 
6.      What is the use and purpose of the Mode and Register sub-fields? 
        The effictive address field is split into two sub fields, known as Mode and 
Register.  The Mode sub-field indicates how to interpret the Register sub-field in order to 
derive the address.
 
7.      Describe memory addressing in the Intel x86 architecture. 
        The Intel x86 has an unusual addressing scheme, due to its 16 bit limitation of 
pins used for addresses. It divides all of       memory into fixed size 64K byte pieces 
called segments. Addresses are 16 bit, and specify a 16 bit offset within one of  the segments.
  All code has to fit into one segment which makes it difficult to program.

8.      Design a SAL instruction that permits addition between two equal-length vectors 
of arbitrary length. How many operands must be specified? 
        In Pascal for i:= 1 to 64 do
                A[i] := B[i] + C[i];

        while:  beg     counter, 64, endwhile
                        lw      $4, bAddress    # need to adjust offset with i
                        lw      $6, cAddress    # need to adjust offset with i
                        add     $4, $4, $6
                        sw      $4, aAddress
                        add     counter, counter, 1
                        b       while
        endwhile:
 
9.      Why might the B and T registers of the CRAY-1 computer be preferable to a cache 
memory? What is the disadvantage of this 'explicit cache'? 
        The T registers are backup registers for the S registers. This is useful because 
the S registers run out quickly, and the temporary storage in the T registers is faster 
than using main memory. B registers are backups for the A registers and are used in the 
same manner as the T registers.  The disadvantage of this "explicit cache" is that the 
programmer is responsible for the management of the B and T registers, verses cache memory where the hardware makes the decisions.
 
10.     What is the purpose of the register windows in the SPARC? 
        The 24 registers are intended to replicate the top of the stack and can be split 
into three equal groups.  These groups correspond to (1) registers containing data shared
 with the procedure that called the current procedure, (2) registers holding variables 
private to the currently invoked procedure, and (3) registers containing data shared with a
 procedure that was called, or will be called soon.

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-