Main »

Homework 4


edit SideBar

Homework 4

Due 03/15
Weight: 15%

Problem 1-3 must be done with your project partner. Names must be included in the partner.txt file included in the supplied tar file.


  • Problem 1-3
    • Electronic submission of files: Submit to learn@UW, One submission per pair, titled hw4.tar

Not Submitted:

  • Problems 4 - 11:
    • These problems are optional and will not be graded but are recommended for a better understanding of the course material.


  • Problem 1 (a) involves modifying your register file to support internal bypassing
  • Problem 1 (b) and problem 2 involve synthesizing your new register file and also your FIFO from HW3
  • Problem 3 involves developing instruction level tests for your processor
  • Homework is due at start of class on 03/15

Provided Files

  • A tarball is provided that includes testbenches and top level module definitions for all verilog problems: hw4.tar
  • Do not edit the provided *_hier.v files

Handin Instructions

  • You must maintain the directory structure that exists in the provided tar file, i.e. each problem has its own subdirectory titled hw4_[1,2,3]
  • All verilog files required to run your verilog must be in each problem's respective subdirectory. You may also need to have copies of some files in each directory.
  • For problem 2, be sure to submit all the files in the synth directory generated by synthesis
  • For problem 3, be sure to submit all your .asm files
  • Once you are done with a Verilog problem, run the command in the problem directory to make sure that you adhere to the Verilog Rules. See Verilog rules check for more details.
  • A legible schematic.pdf file must be in the problem 1 subdirectory with the schematics you drew.
    • Any solution without a corresponding schematic drawing will NOT be graded
    • A scanner is available for general use in Wendt Library
  • The partner.txt file at the top level of the tarball must contain the names of yourself and your partner.
  • Submit only this tar file named hw4.tar - only one partner needs to submit the file

Problem 1 Part (a)

In Verilog, create a register file that includes internal bypassing so that results written in one cycle can be read during the same cycle. Do this by writing an outer "wrapper" module that instantiates your existing (unchanged) register file module; your new module will just add the bypass logic. The list of inputs and outputs of the outer module should be the same as that of the inner module.

Draw a paper and pencil schematic for your new wrapper module and include a scanned copy in the hw4_1 directory named schematic.pdf.

Do not make any changes to the provided rf_hier.v file.

Testbench instructions

You must verify your design using the testbench in the supplied tar file. Run the testbench in your hw4_1 directory using the command rf_bypass_hier_bench *.v

The testbench for this problem (rf_bypass_hier_bench.v) generates a random set of input signals to your module in each cycle, and compares outputs from your module with outputs that are expected from a perfect register file bypass implementation.

If there are no errors in your design you will see a "TEST PASSED" message. If the testbench failed with a "TEST FAILED WITH xx ERRORS" message, look for error messages like "ERRORCHECK: Read data incorrect in cycle <cycle_number>" in the testbench output. Above each of these error messages you will see the inputs to your module, your outputs and the expected outputs for that cycle which can help you debug.

Problem 1 Part (b)

Read the synthesis tutorial on the Synthesis page.

Synthesize your new register file (in the same hw4_1 directory).

Synthesis will create the synth directory which will include rf_bypass.syn.v, area report, timing report, etc. Do not delete this directory - it must be included in your submission. Make sure that in the area report no cell has an area of zero

Problem 2

Read the synthesis tutorial on the Synthesis page.

Copy all files necessary to compile your FIFO from homework 3 to the hw4_2 directory. Synthesize your FIFO.

Synthesis will create the synth directory which will include fifo.syn.v, area report, timing report, etc. Do not delete this directory - it must be included in your submission. Make sure that in the area report no cell has an area of zero

Problem 3

Develop instruction level tests for your processor. In this problem each one of you will develop a set of small programs that are meant to test whether your processor implements these instructions correctly. You will write these programs in assembly, run them on an instruction emulator to make sure what you wrote is indeed testing the right thing. The eventual goal is to run these programs on your processor's verilog implementation and use them to test your implementation.

Info about how to write assembly code and also about how to use the assembler can be found in the Using the assembler page. Details about what each instruction means is available in the ISA specification page.

Each team will be responsible for one randomly assigned instruction (along with common instructions jal, jalr) and must develop a set of simple programs for that instructions. Each team will also have to write programs for jal, jalr instructions along their assigned instruction. The table below gives the assignment of instructions to each team.

Team NamePartner 1Partner 2Instruction
TDHCAEMaxwell Strange"Devin ""HC""" Hartlebensco
xX_MaYMaY_XxWilliam JenStephen Eickj
We Don't ByteMichael GilsdorfSydney Lybertsle
BrogrammersWilliam PasterisJustin Yarringtonbgez
BitSmashersJosh GrafThomas Kuhnbeqz
MacrohardYunhe LiuWenxuan Maorol
B555EVERYNIGHTKalvin MoschkauJon Butlerseq
 Ruolin PingYipeng Zhangxori
 Alexander CurtisHeng Zhuoadd
 Tanner BreischOgden Greenestu
Team NULLZachary KockenMichael Marquardtxor
3Viswesh PeriyasamyNick Metzgerbtr
How Many WaysCory JonetShu Wen Loosubi
TR-8RMarshall MuellerRiccardo Mutschlechnerandni
ModelSimCityMatthew KovarsEric Stellpflugsrli
BaBaLinzuo LiSida Lirori
 Tristan AbbottMaxwell Jinslt
2b | !2bJacob OneyClark Zinzowsll
MIPS: Meaningless Indication of Processor SpeedMitchell EberleRyan Bambroughror
NotoriousBITAlec AndersonFrancis Hertelandn
Too much $$$$Hunter KoeshallLei(Vincent) Xiast
Positive SlackersNathan JayJohn Detterbnez
TripleSixXuyi RuanYudong Sunsub
YeyeJi YunshengMu Yaosrl
GotlMarlene Gotliebn/a n/abltz
 Michael SimonJohn Stakerlbi
Cache MoneyAndrew TautgesCharles Johnsonld
BitweiserJack BoehrerBorkenhagen Johnjr
Straight Cache HomieJack McGintyJoey Bauerslbi
NOPeJonathan BrandAdam Shufeltslli
AvengersRiley MorrisonYefeng Yuanaddi
BadawiRichard LeeAbdullah Akhtarlbi
Knules RulesJared MilesCraig Knuthsll
"Fetch Decode and XORcute"Eric SullivanKai Zhaoroli
0xDEAD + 0xBEEFEric KothNikhil Jainjr
Karu Mad BroNeff ConnorKocher Alexbnez
PatNatPatrick YangNathaniel Perduej
Team GolirevJohn BeckerDaniel Lernerandni
 Connor RehbeinAmr Hassaballahbgez
The Krachey KrabsTyler RobertsMatthew Allensub
ThundercatzSam LinesDavid Mottaddi
TradeMarkTara AbernathyMatt Dallmanslbi
A Catchy NameKeifei FuGan Shengrori
 Scott CatlinAndrew Burdickandn
Cache ThisBrandon QuachJames Liustu
 Bradley MillerHunter Koeshallbltz
Cold Hard CacheForest OlsenJacob Schieberror
 Matthew BeardElliot Frieseq
Team InterrobangJustin EddyAlec Piercerol
Flying ButtressesFrancis Barry-LenochSamuel Calmesadd
 Nick KoxlienJosh Kasuboskislli
 Hikaru WatanabeDonghyun Ryuxor
DLTianshuo SuPeng Chengslt

To get you started below are two example tests for the add instruction.


lbi r1, 255
lbi r2, 255
add r3, r1, r2


lbi r1, 255
lbi r2, 0
add r3, r1, r2

You will notice one thing. The add test uses the lbi instruction also! Your goal while writing these tests is to isolate your instruction as much as possible and minimize the use of the other instructions. Identify different corner cases and the common case for your instruction and develop a set of simple test programs.

The work flow we will follow is:

  1. Write test in WISC-SP13 assembly language.
  2. Assemble using assembler
  3. Simulate the test in the simulator and make sure your test is doing what you thought it was doing. Use the simulator: wisccalculator

Read the following two documents on how to use to assembler and simulator:

Below is a short demo:

prompt% add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst

prompt% wiscalculator loadfile_all.img

WISCalculator v1.0
Author Derek Hower (
Type "help" for more information

Loading program...
lbi r1, -1
INUM:        0 PC: 0x0000 REG: 1 VALUE: 0xffff
lbi r2, -1
INUM:        1 PC: 0x0002 REG: 2 VALUE: 0xffff
add r3, r1, r2
INUM:        2 PC: 0x0004 REG: 3 VALUE: 0xfffe
program halted
INUM:        3 PC: 0x0006
Program Finished


The simulator will print a trace of each instruction along with the state of the relevant registers. You should examine these to make sure that your test is indeed doing what is expected.

What you need to do:

  • Write a set of tests for your instruction. Name them <instruction>_[0,1,2,3,...].asm
  • Use your discretion to decide how many tests you need
  • Identify corner cases. Think about possible bugs in the hardware.
  • In addition to your assigned instruction, everyone must write tests for the jal and jalr instruction
  • Write comments in your assembly code explain what the test is doing
  • The goal of this problem is to make sure you understand the ISA and develop targeted tests for the hardware. Understanding the ISA is required before building hardware for it!

The remaining problems will not be graded but are recommended for better understanding of the course material.

Problem 4

Indicate all of the true, anti-, and output-dependences in the following segment of MIPS assembly code:

    xor    $1, $2, $3
    and    $4, $5, $6
    sub    $7, $4, $5
    add    $5, $1, $5
    sw     $4, 100($7)
    or     $4, $7, $4 

For the code above, which of the dependences will manifest themselves as hazards in the pipeline in Figure 4.41 on page 355 of COD4e? How are these hazards resolved in this pipeline? Assuming the 'xor' instruction enters fetch (F) in cycle 1, in what cycle does the 'or' instruction enter writeback (W)? Show your work in a pipeline diagram. (Assume that the register file cannot read and write the same register in the same cycle and get the new data.)

How does your answer change if you consider the pipeline in 4.60, on page 375 of COD4e? (Assume that the register file contains internal bypassing and can read and write the same register in the same cycle and get the new data.)

Problem 5

Consider the following assembly program to be executed in a MIPS ISA 5-stage(F,D,X,M,W) pipelined data path given in figure 4.51 on page 362 of COD4e:

    I1: add $3,$4,$6
    I2: sub $5,$3,$2
    I3: lw $6,100($5)
    I4: add $5,$6,$3

a) Identify every occurrence and every types of data dependencies True(RAW), Anti(WAR), Output(WAW) in the above problem. Also, indicate which register is involved in that data dependency.

b) If this program is to be executed in a pipelined data path, create a pipeline timing diagram table(clock cycle numbers as column and instructions as rows)assuming NO forwarding, except that register forwarding is available.

c) Identify all the data hazards that may occur as applicable. For each hazard, indicate whether data forwarding(including register forwarding) may be applied to eliminate that hazard. For each hazard, give the two instructions involved, the register involved, and the pipeline register(IF/ID, ID/EX, EX/MEM, MEM/WB)whose output will be used for data forwarding.

Problem 6

Consider the following program code:

    lw  $s1, 8($s0)
    sub $s0,$s1,$S2 
    add $s0,$s0,$s1

If the above program is to be executed in a pipelined datapath given in figure 4.51 on page 362 of COD4e equipped with full data forwarding (as well as register forwarding), complete the timing diagram table(clock cycle numbers as column and instructions as rows). Also mark the clock cycle when a data forwarding(F) takes place or a pipeline stall(S) is inserted.

Problem 7

Consider the following code sequence and the datapath in figure 4.51 on page 362 of COD4e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'and' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.

            beq    $2, $3, foo
            add    $3, $4, $5
            sub    $5, $6, $7
            or     $7, $8, $9
    foo:    and    $5, $6, $7 

Problem 8

Consider the pipeline in Figure 4.51 on page 362; assume predict-not-taken for branches and assume a "Hazard detection unit" in the ID stage as shown on page 379. Can an attempt to flush and an attempt to stall occur simultaneously? If so, do they result in conflicting actions and/or cooperating actions? If there are any cooperating actions, how do they work together? If there are any conflicting actions, which should take priority? What would you do in the design to make sure this works correctly? You may want to consider the following code sequence to help you answer this question:

        beq $1, $2, TARGET  #assume that the branch is taken
        lw  $3, 40($4)
        add $2, $3, $4
        sw  $2, 40($4)
TARGET: or  $1, $1, $2

Problem 9

Consider the following MIPS assemble code segment:

         bne $s1,$s2,LABEL  // $s1 != $s2
         add $t2,$t1,$s1
         sw $t2,4($s1)
         j EXIT
  LABEL: lw $s1,4($s6)
  EXIT:  addi $s1,$s1,4

Assume this code segment on a pipelined data path with data forwarding depicted in figure 4.65 on page 384 of COD4e where the branch decision is made in ID stage.

Assuming $s1 != $s2, a control hazard will occur. Provide a timing diagram table (clock cycle numbers as column and instructions as rows), to show which instructions are running at which phase (F,D,X,M,W)at each clock cycle. If an instruction is flushed from the pipeline, then the remaining phases should not appear. If an instruction is stalled for one cycle, then the remaining phases will be pushed back by one cycle. Indicate on the clock cycle and corresponding instruction for any flush or stall action. (No branch predictors are used in this problem).

Problem 10

During the execution of a program, conditional branches have been executed 15 times. The traces of TAKEN(T) and NOT-TAKEN(N) of each branch instruction are listed below:


a) Prediction accuracy for "always NOT TAKEN" =

b) Prediction accuracy for "1 - bit predictor" =

   Indicate output of predictor for each instruction traced. Outcome = 1 if correct, and 0 if incorrect.

c) Prediction accuracy for "2 - bit predictor" =

   Indicate output of predictor for each instruction traced. Outcome = 1 if correct, and 0 if incorrect.

Note: For dynamic predictors (1 bit and 2 bit), assume the first predicted entry as TAKEN (T) and then proceed.

Problem 11

High performance datapaths use bypass paths (also known as data forwarding logic) to reduce pipeline stalls. However, bypass paths are relatively expensive, especially in some wire constrained technologies. To reduce the cost (and potential cycle time impact), some architects have explored omitting some of the possible bypass paths. Consider the datapath illustrated below (note that the PC update logic and all control logic is intentionally omitted). This pipelined datapath is similar to the one in the book, but only has bypass paths on one side of the ALU. Assume that the register file intentionally bypasses the value, so that if register Si is read and written in the same cycle, then the read returns the new value. Assume that the control logic bypasses the data as soon as possible using the given forwarding data paths, and stalls in decode otherwise. You may NOT add additional data paths.

In this problem, you will look at how a program snippet performs on this pipleline. Recall that R-format instructions have the form: opcode rd, rs, rt

and I-format instructions have the form: opcode rt, imm(rs) or opcode rt, rs, imm

Use the table given below to show how the given instruction sequence flows through the pipeline and where stalls are necessary to resolve hazards.

Timing Table

Consider the code and pipeline above. Show the execution of this code on the pipeline above. Use the letters, F, D, X, M, and W.

For each cycle where a stall occurs explain why ?

Page last modified on March 11, 2016, visited 3032 times

Edit - History - Print - Recent Changes (All) - Search