Homework 3 // Due at Lecture Wed, Oct 22
Problem 1 (40 points)
Construct and fill out the following tables for each cycle of the program's execution (get good with copy and paste ;-) ):
1) ROB
2) Reservation Stations
3) Map Table
4) Free List
5) CDB
for an R10K processor executing the following code (note: the destination operand is always on the right):
loop: ldf X(r1),f1
mulf f1,f3, f0
ldf X(r2),f1
addf f1, f0, f2
stf f2,X(r1)
addi r1,4,r1
addi r2,4,r2
slt r1,r3,r4
BNQZ r4,loop
Assumptions:
1) The processor can dispatch, issue, complete and retire one instruction at a time.
2) Floating Point, load and store operations take 3 cycles to execute.
3) Integer operations take 1 cycle to execute.
4) Function units are fully pipelined.
5) Branch prediction is perfect and the above loop is repeated twice.
6) There are only 16 physical registers and they can be used to store floating point and integer values.
7) There are only 8 ROB slots available and 5 reservation stations.
The tables' status at the end of cycle 2 (note the column for Retire (R) also):
ROB
ht # Inst T Told S X C R
h 1 ldf X(r1),f1 PR#9 PR#2 c2
t 2 mulf f1,f3,f0 PR#10 PR#1
3
Reservation Stations
# FU Busy op T T1 T2
1 ALU N
2 L/S1 Y ldf PR#9 - PR#5+
3 L/S2 N
4 FP1 Y mulf PR#10 PR#9 PR#4+
5 FP2 N
Map Table
f0 PR#10
f1 PR#9
f2 PR#3+
f3 PR#4+
r1 PR#5+
r2 PR#6+
r3 PR#7+
r4 PR#8+
Free List
PR#11,PR#12,PR#13,PR#14,PR#15,PR#16
Reference: The MIPS R10000 SuperScalar MicroProcessor By Kenneth C. Yeager
Problem 2 (20 points)
H&P Case Study 3.1, questions a, b and c.
Problem 3: Program analysis (40 points)
In class we discussed memory disambiguation and techniques to improve the performance of load-store queues. In this problem you
will modify simple-scalar to determine how frequently loads and stores in a program really point to the same address. Modify sim-fast to determine for every load, if it generates an address conflicts with any one of the previous X instructions (we will call this
distance). If a load causes such a conflict increment a counter called conflict-counter. Note: Of the previous X instructions
we are only interested in the load and store instructions. Classify these conflicts as load/load (conflict-counter-load-load) and load/store (conflict-counter-load-store). Perform your analysis for the first 500 million instructions in gcc.Use information from HW2 for simulating gcc on simplescalar. Use cc1_peak.ev6. Sim-fast by itself does not have support for
stopping simulation after 500 million instructions. Use the version found at
http://www.cs.wisc.edu/~cs752-1/Fall2008/downloads that has this support added. Over-write sim-fast.c
you downloaded for HW2 with this file. You can invoke this version of sim-fast as follows:
sh> sim-fast -max:inst 500000000
You will be simulating a machine that uses Alpha ISA. SPEC2000 program binaries for that ISA are easier to use and are already
available on /unsup/spec2000. To ensure that your simulator is configured to simulate the Alpha ISA make sure you do the
following:
sh> make clean
sh> make config-alpha
sh> make sim-fast
You must submit the following:
|