UW-Madison
Computer Sciences Dept.

CS/ECE 552 Introduction to Computer Architecture Spring 2010 Section 1
Instructor David A. Wood and T. A. Tony Nowatzki
URL: http://www.cs.wisc.edu/~david/courses/cs552/S10/

CS/ECE 552 : Introduction to Computer Architecture
Spring 2010
Prof. Wood
Problem Set #4

Due: March 17th, 2010
Approximate Weight : 15% of homework grade

Assignment is split between project group and individual work


  • Homework is due at start of class
  • Problems 1 - 3 MUST be done with your project group    (all electronic:  handin to “hw4”)
  • Problems 4 - 7 MUST be done ALONE                                  (all paper)
  • Problem 8 must also be done ALONE                                  (all electronic: handin to “inst_test”)
  • No exceptions to the above handin rules will be allowed, as this is already unduly complicated (to grade).
  • You must abide by theVerilog file naming conventions
  • All verilog code must pass Vcheck
  • Each problem must be in its own directory
  • If a problem requires files from a different directory, then create a copy of the file in each directory.

Problem 1 - 10 Points

In Verilog, create a register file that includes internal bypassing so that results written in one cycle can be read during the same cycle. Do this by writing an outer "wrapper" module that instantiates your existing (unchanged) register file module; your new module will just add the bypass logic. The list of inputs and outputs of the outer module should be the same as that of the inner module. Submit your Verilog source and your testing results.

  • Call this module rf_bypass and it should be in a file called rf_bypass.v

  • Modify rf_hier.v from problem3 so that it now instantiates rf_bypass instead of rf.

  • The inputs and output interface for rf_bypass.v should be identical to rf.v

  • Use the rf_bypass_bench.v testbench. Here are some usage instructions: Usage instructions.

What to submit: (Directory name: prob1)

  1. Describe precisely how you augmented your hw3 register file in README.txt

  2. Any modifications to the testbench if required. If you use the testbench provided, electronically submit the text output of the program as rf_bench.out (see 4 below). Modelsim will write the text output to a file called transcript in your project directory.

  3. All your verilog source code.



Problem 2 – 10 Points

Synthesize your register file from homework 3

Synthesize will create the synth directory which will include rf.syn.v, area report, timing report, etc.

What to submit: (Directory name: prob2)

  1. Verilog files from hw3's register file

  2. Add the entire synth directory

  3. Make sure rf.syn.v, and the 4 report files are present (Make sure that in the area report no cell has an area of zero)

  4. In the readme, fill in this info:

    1. Total area

    2. Worst case slack




Problem 3 – 10 Points

Synthesize your FIFO from homework 3.

Synthesize will create the synth directory which will include fifo.syn.v, area report, timing report, etc.

What to submit: (Directory name: prob3)

  1. Verilog files from hw3's fifo

  2. Add the entire synth directory

  3. Make sure fifo.syn.v, and the 4 report files are present (Make sure that in the area report no cell has an area of zero)

  4. In the readme, fill in this info:

    1. Total area

    2. Worst case slacklack



#end group work#


Problem 4 – 15 Points

Consider the following code sequence and the datapath in figure 4.51 on page 362 of COD4e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'add' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.

          beq    $2, $1, loc
          xor    $1, $4, $3
          and    $3, $6, $7
          sub    $7, $5, $8
    loc:  add    $3, $6, $7 

Problem 5 – 15 Points

Indicate all of the true, anti-, and output-dependencies in the following segment of MIPS assembly code:

    sub    $2, $7, $3
    add    $4, $5, $6
    or     $1, $4, $5
    add    $5, $2, $5
    sw     $4, 20($1)
    xor    $4, $1, $4 

For the code above, which of the dependencies will manifest themselves as hazards in the pipeline in Figure 4.41 on page 355 of COD4e? How are these hazards resolved in this pipeline? Assuming the 'sub' instruction enters fetch (F) in cycle 1, in what cycle does the 'xor' instruction enter writeback (W)? Show your work in a pipeline diagram. (Assume that the register file cannot read and write the same register in the same cycle and get the new data.)

How does your answer change if you consider the pipeline in figure 4.60, on page 375 of COD4e? (Assume that the register file contains internal bypassing and can read and write the same register in the same cycle and get the new data.)


Problem 6 – 10 Points

Consider the pipeline in Figure 4.51 on page 362; assume predict-not-taken for branches and assume a "Hazard detection unit" in the ID stage as shown on page 379. Can an attempt to flush and an attempt to stall occur simultaneously? If so, do they result in conflicting actions and/or cooperating actions? If there are any cooperating actions, how do they work together? If there are an conflicting actions, which should take priority? What would you do in the design to make sure this works correctly? You may want to consider the following code sequence to help you answer this question:

        beq $5, $2, loc  #assume that the branch is taken
        lw  $3, 40($4)
        add $2, $3, $4
        sw  $2, 40($4)
loc:    or  $5, $5, $2

Problem 7 – 15 Points

Consider a pipeline where branches are predicted not-taken, and a taken branch introduces three-cycle penalty. Suppose you are considering adding a delayed branch slot to your instruction set architecture, so that taken branches would only have a two-cycle penalty. Consider the following three fragments of code:

Fragment 1:

        add $5, $5, $2
        beq $5, $6, Target
        lw $4, 0($2)
        .
        .
        .
Target: lw $1, 0($7)
        ...


Fragment 2:

        add $5, $5, $2
        beq $5, $6, Target
        lw $4, 0($7)
        .
        .
        .
Target: sub $4, $8, $3
        ...


Fragment 3:

        movei $2, 21  // End-of-loop count
        .
        .
        .
        addi $4, $4, 1
        beq $4, $2, Target
        .
        .
        .
Target: ...

Re-arrange or re-write each of the fragments so that it will work correctly with a branch delay slot and maximize performance. (The dots represent an unknown amount of other code that you can't change.) What is the average number of cycles that were saved or lost in each case if you used the delayed branch architecture? (Assume branches are taken 60% of the time.)

While a good idea at the time, branch delay slots are discouraged in modern processors with deep pipelines in favor of dynamic branch predictors. Why do you think this is so? Why would a branch delay instruction perform poorly in a long pipeline?


Problem 8 – 15 Points

(submit this problem under inst_test, instead of hw4)


Develop instruction level tests for your processor. In this problem each of you will develop a set of small programs that are meant to test whether your processor implements these instructions correctly. You will write these programs in assembly, run them on an instruction emulator to make sure what you wrote is indeed testing the write thing. The eventual goal is to run these programs on your processor's verilog implementation and use them to test your implementation.

Each of you will be responsible for one instruction and must develop a set of simple programs for that instructions. The table below gives the assignment of instructions to students.

aarti

addi

abrown

subi

ammar

ori

asplund

andi

atishay

roli

ayoung

slli

bechard

rori

brant-ho

srai

brinsko

st

capel

ld

chanson

stu

cofell

add

diedrich

sub

emiller

or

frederic

and

frericks

rol

grigoriy

sll

halbach

ror

hang

sra

hanly

seq

hao

slt

hoese

sle

in

sco

jalal

beqz

jastrows

bnez

jatin

lbi

jimmy

slbi

jmartine

j

joel

jr

kjell

jal

klingens

jalr

langenfe

sll

markus

slt

marsh

slli

martell

bnez

michlig

ori

millican

sle

morrell

bgez

ndimick

srai

nystrom

st

ott

ror

passofar

roli

pdickey

jalr

rezny

addi

samanas

subi

schanke

add

sefiddas

sub

shourjo

jalr

soumphol

beqz

spallett

bnez

swati

sco

varun

sle

vaughn

seq

weisman

slt

wilcox

sll

wyler

sra

xiaofeng

rori

Zignego

j

To get you started below are two example tests for the add instruction.

add_0.asm

lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt

add_1.asm

lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt

You will notice one thing. The add test uses the lbi instruction also! Your goal while writing these tests is to isolate your instruction as much as possible and minimize the use of the other instructions. Identify different corner cases and the common case for your instruction and develop a set of simple test programs.

The work flow we will follow is:

  1. Write test in WISC-SP10 assembly language.

  2. Assemble using assembler assemble.sh

  3. Simulate the test in the simulator and make sure your test is doing what you thought it was doing. Use the simulator:wisccalculator


Below is a short demo:

prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst

prompt% wiscalculator loadfile_all.img

WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information

Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished

prompt%

The simulator will print a trace of each instruction along with the state of the registers. You should examine these to make sure that your test is indeed doing what is expected. For the st instruction you will need to examine memory also.

What you need to do:

  • Write a set of tests for your instruction. Name them <opcode>_[0,1,2,3,4].asm

  • Use your discretion to decide how many tests you need

  • Identify corner cases. Think about possible bugs in the hardware.

  • Write comments in your assembly code explain what the test is doing

  • The goal of this problem is to make sure you understand the ISA and develop targeted tests for the hardware. Understanding the ISA is required before building hardware for it!

I will make all tests available to everyone, so you can use these to debug and test your verilog implementation. One of the first things, you must do after putting together your full processor is run each of these tests and test each individual instruction.


Submit under “inst_test”:

  • Save all your assembly files in this directory

  • Written explanation of what your tests do and justification why your set of tests is comprehensive


 
Computer Sciences | UW Home