- (10 points)
Let
Bucky[1024, 1024] be a two-dimensional array
of IEEE single-precision floating-point numbers stored in row-major
order. Consider a simple program fragment that sums and then zeros
all the elements.
Let all "r" variables be allocated in registers and ignore the
effect of instruction fetches.
rsum = 0.0
do rj = 1 to 1024 {
do ri = 1 to 1024 {
rsum = rsum + Bucky[ri,rj]
}}
do rj = 1 to 1024 {
do ri = 1 to 1024 {
Bucky[ri,rj] = 0.0
}}
Assume this program fragment executes on a system with a data cache that is
only 512 bytes large and uses 32-byte blocks. State
any additional assumptions you need to make.
- How many misses will this program suffer?
- Write a new program that suffers many fewer misses.
(The best answer suffers fewer than 1/10th the misses).
- How many misses does your improved program suffer?
- (8 points)
Perform the following IEEE Single-Precision Floating-Point subtraction.
Use standard rounding (to nearest, with even as tie-breaker). Show your
work. Put your final answer back into hexadecimal.
0xc67ff800
- 0x407ff800
-------------
- (4 points)
A computer system has two levels of cache, called L1 and L2.
If an access misses in the L1 cache, the access request is
sent to the L2 cache.
If the L2 cache misses, then the access request is sent to main
memory.
Assume the following cache access parameters to show the work
in calculating
the average memory access time (AMAT) for this computer system.
- TL1 is 2 nsec
- TL2 is 20 nsec
- Tmain memory is 100 nsec
- Of all memory requests, 98.2% hit in the L1 cache.
- Of all requests sent to the L2 cache, 75% hit
- (6 points)
Identify every dependency (read-after-write, write-after-read,
write-after-write and control) in the following code fragment.
Classify each dependency as a data dependency or a control dependency.
For each dependency, tell whether it would cause a stall (pipeline
hole) for the 5-stage MIPS pipeline presented in class.
addi $12, $13, $11
lw $13, 4($12)
and $8, $8, $12
sub $8, $10, $11
- (8 points)
Assume the MIPS 5-stage pipeline is to be executing
the following MIPS RISC R2000 code
lw $t0, 16($sp)
add $t1, $t0, $t3
sw $t1, 4($sp)
addi $t4, $t4, 1
A. Identify any data dependencies in this code that would cause
a stall (hole, bubble) in the pipeline.
B. Draw a diagram of the MIPS 5-stage pipeline, showing the
execution of this MIPS RISC R2000 code fragment.
- (6 points)
You are a computer designer considering how to make your design run
faster. For some important customer programs, your base design
currently spends 25% of time doing floating-point addition and
10% doing floating-point multiplication. You know:
(a) a good method to make addition twice as fast and
(b) another method to overlap multiplication
with other things (making its contribution execution time be zero).
- What are the overall speedup for (a) and (b) separately?
- If you could use only one, which one would you use? Why?