UW-Madison
Computer Sciences Dept.

CS/ECE 552 Introduction to Computer Architecture


Spring 2012 Section 1
Instructor David A. Wood and T. A. Ramkumar Ravikumar
URL: http://www.cs.wisc.edu/~david/courses/cs552/S12/

CS/ECE 552 : Introduction to Computer Architecture
Spring 2012
Prof. Wood
Problem Set #4

Due: March 19th, 2012
Approximate Weight : 15% of homework grade

Assignment is split between project group and individual work

You can find the PDF copy of Discussion Session 7 here

You can find Problem 6 of this HW here


  • Homework is due at start of class
  • Problems 1 - 3 MUST be done with your project group    (all electronic:  handin to “hw4_1, hw4_2 and hw4_3”)
  • Problems 4 - 7 MUST be done ALONE                                  (all paper)
  • Problem 8 must also be done ALONE                                  (all electronic: handin to “TBA”)
  • No exceptions to the above handin rules will be allowed, as this is already unduly complicated (to grade).
  • You must abide by theVerilog file naming conventions
  • All verilog code must pass Vcheck
  • Each problem must be in its own directory
  • If a problem requires files from a different directory, then create a copy of the file in each directory.

Problem 1 - 10 Points

In Verilog, create a register file that includes internal bypassing so that results written in one cycle can be read during the same cycle. Do this by writing an outer "wrapper" module that instantiates your existing (unchanged) register file module; your new module will just add the bypass logic. The list of inputs and outputs of the outer module should be the same as that of the inner module. Submit your Verilog source code and your testing results.

  • Call this module rf_bypass and it should be in a file called rf_bypass.v

  • Modify rf_hier.v from problem3 of HW2 so that it now instantiates rf_bypass instead of rf.

  • The inputs and output interface for rf_bypass.v should be identical to rf.v

  • Use the rf_bypass_bench.v testbench. Here are some usage instructions: Usage instructions.

What to submit: (Directory name: hw4_1)

  1. Describe precisely how you augmented your hw3 register file in README.txt

  2. Any modifications to the testbench if required. If you use the testbench provided, electronically submit the text output of the program as rf_bench.out (see 4 below). Modelsim will write the text output to a file called transcript in your project directory.

  3. All your verilog source code.



Problem 2 – 10 Points

Read the Synthesis Tutorial

Synthesize your register file from homework 3

Synthesis will create the synth directory which will include rf.syn.v, area report, timing report, etc.

What to submit: (Directory name: hw4_2)

  1. Verilog files from hw3's register file

  2. Add the entire synth directory

  3. Make sure rf.syn.v, and the 4 report files are present (Make sure that in the area report no cell has an area of zero)

  4. In the readme, fill in this info:

    1. Total area

    2. Worst case slack




Problem 3 – 10 Points

Read the Synthesis Tutorial

Synthesize your FIFO from homework 3.

Synthesize will create the synth directory which will include fifo.syn.v, area report, timing report, etc.

What to submit: (Directory name: hw4_3)

  1. Verilog files from hw3's fifo

  2. Add the entire synth directory

  3. Make sure fifo.syn.v, and the 4 report files are present (Make sure that in the area report no cell has an area of zero)

  4. In the readme, fill in this info:

    1. Total area

    2. Worst case slacklack



#end group work#


Problem 4 – 15 Points

Consider the following code sequence and the datapath in figure 4.51 on page 362 of COD4e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'and' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.

          beq    $2, $3, foo
          add     $3, $4, $5
          sub     $5, $6, $7
          or      $7, $8, $9
    foo: and    $5, $6, $7 

Problem 5 – 15 Points

For each of the three MIPS assembly code segments, (a) indicate the dependences and their types, (b) Assuming that there is NO forwarding in the pipelined processor, indicate hazards and add NOP instructions to eliminate them, (c) Assuming thre is FULL forwarding in the pipelined processor, indicate hazards and add NOP instructions to eliminate them.

(a)          add    $4, $4, $2
             sub     $5, $3, $1
             lw      $6, 200($3)
             add     $7, $3, $6
(b)          lw     $1, 40($6)
             add     $6, $2, $2
             sw      $6, 50($1)
(c)          lw     $5, -16($5)
             sw      $5, -16($5)
            add      $5, $5, $5

Problem 6 – 10 Points

COD4E - EXCERCISE 4.24.1 - 4.24.3 (PAGE 432) with changes below

(a) Take pattern as T, T, NT, T and (b) Take pattern as T, T, T, NT, NT

Problem 7 – 15 Points

Consider a pipeline where branches are predicted not-taken, and a taken branch introduces three-cycle penalty. Suppose you are considering adding a delayed branch slot to your instruction set architecture, so that taken branches would only have a two-cycle penalty. Consider the following three fragments of code:

Fragment 1:

        add $5, $5, $2
        beq $5, $6, Target
        lw $4, 0($2)
        .
        .
        .
Target: lw $1, 0($7)
        ...


Fragment 2:

        add $5, $5, $2
        beq $5, $6, Target
        lw $4, 0($7)
        .
        .
        .
Target: sub $4, $8, $3
        ...


Fragment 3:

        movei $2, 21  // End-of-loop count
        .
        .
        .
        addi $4, $4, 1
        beq $4, $2, Target
        .
        .
        .
Target: ...

Re-arrange or re-write each of the fragments so that it will work correctly with a branch delay slot and maximize performance. (The dots represent an unknown amount of other code that you can't change.) What is the average number of cycles that were saved or lost in each case if you used the delayed branch architecture? (Assume branches are taken 60% of the time.)

While a good idea at the time, branch delay slots are discouraged in modern processors with deep pipelines in favor of dynamic branch predictors. Why do you think this is so? Why would a branch delay instruction perform poorly in a long pipeline?


Problem 8 – 15 Points


Develop instruction level tests for your processor. In this problem each of you will develop a set of small programs that are meant to test whether your processor implements these instructions correctly. You will write these programs in assembly, run them on an instruction emulator to make sure what you wrote is indeed testing the write thing. The eventual goal is to run these programs on your processor's verilog implementation and use them to test your implementation.

Each of you will be responsible for one instruction and must develop a set of simple programs for that instructions. The table below gives the assignment of instructions to students.

ahmad

addi

alexm

subi

ampomah

ori

andracek

andi

ativut

roli

bayer

bltz

chao

rori

chunw

srai

danielr

st

davidm

ld

deblon

stu

dexter

add

dimitrio

sub

dmiller

or

dragga

and

eichers

rol

fessler

sll

fisher

ror

foss

sra

gola

seq

harter

slt

harwell

sle

hittson

sco

hongzhuo

beqz

huanchen

bnez

ishani

lbi

jaffke

slbi

jalal

j

jiaduo

jr

jitrapon

jal

jliu

jalr

jui-chie

sll

justmann

slt

katelyn

slli

kolp

bnez

kpark38

ori

kulcyk

sle

lars

bgez

little

srai

mcc

st

mgm

ror

mschmid

roli

nwilliam

jalr

parvi

addi

pjohnson

subi

redderse

add

roberts

sub

sato

jalr

schleife

beqz

shanpeng

bnez

shubham

sco

skobov

sle

sok

seq

starr

slt

suli

sll

swiercze

sra

theodor

st

tong

btr

vander-p

srai

van-maas

roli

weisnich

and

wysocki

rori

yaman

sll

yashashr

bltz

zxie

btr

To get you started below are two example tests for the add instruction.

add_0.asm

lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt

add_1.asm

lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt

You will notice one thing. The add test uses the lbi instruction also! Your goal while writing these tests is to isolate your instruction as much as possible and minimize the use of the other instructions. Identify different corner cases and the common case for your instruction and develop a set of simple test programs.

The work flow we will follow is:

  1. Write test in WISC-SP12 assembly language.

  2. Assemble using assembler assemble.sh

  3. Simulate the test in the simulator and make sure your test is doing what you thought it was doing. Use the simulator:wisccalculator


Below is a short demo:

prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst

prompt% wiscalculator loadfile_all.img

WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information

Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished

prompt%

The simulator will print a trace of each instruction along with the state of the registers. You should examine these to make sure that your test is indeed doing what is expected. For the st instruction you will need to examine memory also.

What you need to do:

  • Write a set of tests for your instruction. Name them <opcode>_[0,1,2,3,4].asm

  • Use your discretion to decide how many tests you need

  • Identify corner cases. Think about possible bugs in the hardware.

  • Write comments in your assembly code explain what the test is doing

  • The goal of this problem is to make sure you understand the ISA and develop targeted tests for the hardware. Understanding the ISA is required before building hardware for it!

I will make all tests available to everyone, so you can use these to debug and test your verilog implementation. One of the first things, you must do after putting together your full processor is run each of these tests and test each individual instruction.


What to submit

  1. Physical copy
  • Written explanation of what your tests do and justification why your set of tests is comprehensive
  • Electronic submission instructions

    Create a folder named “prob_inst”:

    • All your assembly files must be in this directory

    • Write a set of tests for your instruction. Name them _[0,1,2,3,4].asm

    • Use your discretion to decide how many tests you need

    • Identify corner cases. Think about possible bugs in the hardware.

    • Written explanation of what your tests do and justification why your set of tests is comprehensive (Copy paste the contents you prepare manually into a README.txt)

    Handin Instructions

    Hand in your homework using the CS handin program.

    • Make a folder for each problem (hw4_1, hw4_2, hw4_3 and prob_inst)
    • Each folder should contain all the verilog files for that problem. prob_inst should contain your .asm files
    • name and signals for the top level module should be as indicated for each problem
    • tar these 4 folders to 'cs username'.tar [example : tar cvf ram.tar hw4_1 hw4_2 hw4_3 prob_inst]
    • Copy the tar file over to an empty folder and submit it using handin documentation [example: mkdir ram; mv ram.tar ram]
      * <class_name> cs552-1
      * <assignment_name> HW4
      * <directory_path> 'location_of_the folder_you_created'

  •  
    Computer Sciences | UW Home