UW-Madison
Computer Sciences Dept.

CS/ECE 752 Advanced Computer Architecture I Fall 2008 Section 1
Instructor David A. Wood and T. A. Khai Tran
URL: http://www.cs.wisc.edu/~david/courses/cs752/Fall2008/

Homework 3 // Due at Lecture Wed, Oct 22

Problem 1 (40 points)

Construct and fill out the following tables for each cycle of the program's execution (get good with copy and paste ;-) ):
1) ROB
2) Reservation Stations
3) Map Table
4) Free List
5) CDB
  for an R10K processor executing the following code (note: the destination operand is always on the right):

loop: 	ldf X(r1),f1
	mulf f1,f3, f0
  	ldf X(r2),f1
	addf f1, f0, f2
	stf f2,X(r1)
	addi r1,4,r1
	addi r2,4,r2
	slt r1,r3,r4
	BNQZ r4,loop

Assumptions:
1) The processor can dispatch, issue, complete and retire one instruction at a time.
2) Floating Point, load and store operations take 3 cycles to execute.
3) Integer operations take 1 cycle to execute.
4) Function units are fully pipelined.
5) Branch prediction is perfect and the above loop is repeated twice.
6) There are only 16 physical registers and they can be used to store floating point and integer values.
7) There are only 8 ROB slots available and 5 reservation stations.

The tables' status at the end of cycle 2 (note the column for Retire (R) also):
ROB 
ht 	#	Inst		T	Told	S	X	C	R
h	1	ldf X(r1),f1	PR#9	PR#2	c2	
t	2	mulf f1,f3,f0	PR#10	PR#1	
	3
	
Reservation Stations
#	FU	Busy	op	T	T1	T2
1	ALU	N	
2	L/S1	Y	ldf	PR#9	-	PR#5+
3	L/S2	N	
4	FP1	Y	mulf	PR#10	PR#9	PR#4+
5	FP2	N

Map Table
f0	PR#10
f1	PR#9
f2	PR#3+
f3	PR#4+
r1	PR#5+
r2	PR#6+
r3	PR#7+
r4	PR#8+

Free List
PR#11,PR#12,PR#13,PR#14,PR#15,PR#16

Reference: The MIPS R10000 SuperScalar MicroProcessor By Kenneth C. Yeager

Problem 2 (20 points)

H&P Case Study 3.1, questions a, b and c.

Problem 3: Program analysis (40 points)

In class we discussed memory disambiguation and techniques to improve the performance of load-store queues. In this problem you will modify simple-scalar to determine how frequently loads and stores in a program really point to the same address. Modify sim-fast to determine for every load, if it generates an address conflicts with any one of the previous X instructions (we will call this distance). If a load causes such a conflict increment a counter called conflict-counter. Note: Of the previous X instructions we are only interested in the load and store instructions. Classify these conflicts as load/load (conflict-counter-load-load) and load/store (conflict-counter-load-store). Perform your analysis for the first 500 million instructions in gcc.Use information from HW2 for simulating gcc on simplescalar. Use cc1_peak.ev6. Sim-fast by itself does not have support for stopping simulation after 500 million instructions. Use the version found at http://www.cs.wisc.edu/~cs752-1/Fall2008/downloads that has this support added. Over-write sim-fast.c you downloaded for HW2 with this file. You can invoke this version of sim-fast as follows:

    sh> sim-fast -max:inst 500000000

You will be simulating a machine that uses Alpha ISA. SPEC2000 program binaries for that ISA are easier to use and are already available on /unsup/spec2000. To ensure that your simulator is configured to simulate the Alpha ISA make sure you do the following:

    sh> make clean
    sh> make config-alpha
    sh> make sim-fast

You must submit the following:

  • Submit a table with "distance" and the number of conflicts (confict-counter). You must have a separate column for load-load, load-store, and total conflicts. Show distances of 32, 64, 256, 1024, 16384. (30 points) For example:
    Distance      Load-load    Load-store       Total 
    32                  4                  4                        8 
    64                  9                  11                      20 
    
  • Do you see any spikes in the data? If so explain the trend. (10 points)
  • Electronically submit the files you modified in a tgz called hw3-p3.tgz. E-mail to the TA with the Subject line: "CS752 HW3".

 
Computer Sciences | UW Home