University of Wisconsin Computer Sciences Header Map (repeated with 
textual links if page includes departmental footer) Useful Resources Research at UW-Madison CS Dept UW-Madison CS Undergraduate Program UW-Madison CS Graduate Program UW-Madison CS People Useful Information Current Seminars in the CS Department Search Our Site UW-Madison CS Computer Systems Laboratory UW-Madison Computer Sciences Department Home Page UW-Madison Home Page

Homework 6

CS/ECE 354, Fall 2003


Send e-mail questions to Eric Robinson, erobinso@cs.wisc.edu
Due Wednesday, December 10th, before 5 pm.


Homework (42 points total)

  1. (10 points) Let Bucky[1024, 1024] be a two-dimensional array of IEEE single-precision floating-point numbers stored in row-major order. Consider a simple program fragment that sums and then zeros all the elements. Let all "r" variables be allocated in registers and ignore the effect of instruction fetches.
       rsum = 0.0
    
       do rj = 1 to 1024 {
           do ri = 1 to 1024 {
    	    rsum = rsum + Bucky[ri,rj]
       }}
    
       do rj = 1 to 1024 {
           do ri = 1 to 1024 {
    	    Bucky[ri,rj] = 0.0
       }}
    
    Assume this program fragment executes on a system with a data cache that is only 512 bytes large and uses 32-byte blocks. State any additional assumptions you need to make.
    1. How many misses will this program suffer?
    2. Write a new program that suffers many fewer misses. (The best answer suffers fewer than 1/10th the misses).
    3. How many misses does your improved program suffer?

  2. (8 points) Perform the following IEEE Single-Precision Floating-Point subtraction. Use standard rounding (to nearest, with even as tie-breaker). Show your work. Put your final answer back into hexadecimal.
           0xc67ff800
         - 0x407ff800
         -------------
    
    
  3. (4 points) A computer system has two levels of cache, called L1 and L2. If an access misses in the L1 cache, the access request is sent to the L2 cache. If the L2 cache misses, then the access request is sent to main memory. Assume the following cache access parameters to show the work in calculating the average memory access time (AMAT) for this computer system.

  4. (6 points) Identify every dependency (read-after-write, write-after-read, write-after-write and control) in the following code fragment. Classify each dependency as a data dependency or a control dependency. For each dependency, tell whether it would cause a stall (pipeline hole) for the 5-stage MIPS pipeline presented in class.
    
             addi   $12, $13, $11
             lw     $13, 4($12)
             and    $8, $8, $12
             sub    $8, $10, $11
    
  5. (8 points) Assume the MIPS 5-stage pipeline is to be executing the following MIPS RISC R2000 code
    
    	lw   $t0, 16($sp)
    	add  $t1, $t0, $t3
    	sw   $t1, 4($sp)
    	addi $t4, $t4, 1
    
    
    A. Identify any data dependencies in this code that would cause a stall (hole, bubble) in the pipeline.

    B. Draw a diagram of the MIPS 5-stage pipeline, showing the execution of this MIPS RISC R2000 code fragment.

  6. (6 points) You are a computer designer considering how to make your design run faster. For some important customer programs, your base design currently spends 25% of time doing floating-point addition and 10% doing floating-point multiplication. You know:
    (a) a good method to make addition twice as fast and
    (b) another method to overlap multiplication with other things (making its contribution execution time be zero).
    1. What are the overall speedup for (a) and (b) separately?
    2. If you could use only one, which one would you use? Why?

Handing In the Homework

Turn in your file containing answers by running the script:


  handinHW6 hw6file

where hw6file is an example name of a file containing the homework answers. The handinHW script submits your homework for you. No printout will be turned in.