Homework is due at start of class
Problems 1 - 3 MUST be done with your project group (all electronic: handin to “hw4”)
Problems 4 - 7 MUST be done ALONE (all paper)
Problem 8 must also be done ALONE (all electronic: handin to “inst_test”)
No exceptions to the above handin rules will be allowed, as this is already unduly complicated (to grade).
You must abide by theVerilog file naming conventions
All verilog code must pass Vcheck
Each problem must be in its own directory
If a problem requires files from a different directory, then create a copy of the file in each directory.
Problem
1 - 10 Points
In
Verilog, create a register file that includes internal bypassing so
that results written in one cycle can be read during the same cycle.
Do this by writing an outer "wrapper" module that
instantiates your existing (unchanged) register file module; your
new module will just add the bypass logic. The list of inputs and
outputs of the outer module should be the same as that of the inner
module. Submit your Verilog source and your testing results.
Call
this module rf_bypass and it should be in a file called rf_bypass.v
Modify
rf_hier.v from problem3 so that it now instantiates rf_bypass
instead of rf.
The
inputs and output interface for rf_bypass.v should be identical to
rf.v
Use
the rf_bypass_bench.v
testbench.
Here are some usage instructions: Usage
instructions.
What
to submit: (Directory name: prob1)
Describe
precisely how you augmented your hw3 register file in README.txt
Any
modifications to the testbench if required. If you use the
testbench provided, electronically submit the text output of the
program as rf_bench.out (see 4 below). Modelsim will write the text
output to a file called transcript
in
your project directory.
All
your verilog source code.
Problem
2 – 10 Points
Synthesize
your register file from homework
3
Synthesize
will create the synth directory which will include rf.syn.v, area
report, timing report, etc.
What
to submit: (Directory name: prob2)
Verilog
files from hw3's register file
Add
the entire synth directory
Make
sure rf.syn.v, and the 4 report files are present (Make sure that
in the area report no cell has an area of zero)
In
the readme, fill in this info:
Total
area
Worst
case slack
Problem
3 – 10 Points
Synthesize
your FIFO from homework 3.
Synthesize
will create the synth directory which will include fifo.syn.v, area
report, timing report, etc.
What
to submit: (Directory name: prob3)
Verilog
files from hw3's fifo
Add
the entire synth directory
Make
sure fifo.syn.v, and the 4 report files are present (Make sure that
in the area report no cell has an area of zero)
In
the readme, fill in this info:
Total
area
Worst
case slacklack
#end
group work#
Problem
4 – 15 Points
Consider
the following code sequence and the datapath in figure 4.51 on page
362 of COD4e. Assuming the first instruction is fetched in cycle 1
and the branch is not taken, in which cycle does the 'add'
instruction write its value to the register file? What if the branch
IS taken? (Assume no branch prediction). Show pipeline diagrams.
beq $2, $1, loc
xor $1, $4, $3
and $3, $6, $7
sub $7, $5, $8
loc: add $3, $6, $7
Problem
5 – 15 Points
Indicate
all of the true, anti-, and output-dependencies in the following
segment of MIPS assembly code:
sub $2, $7, $3
add $4, $5, $6
or $1, $4, $5
add $5, $2, $5
sw $4, 20($1)
xor $4, $1, $4
For
the code above, which of the dependencies will manifest themselves
as hazards in the pipeline in Figure 4.41 on page 355 of COD4e? How
are these hazards resolved in this pipeline? Assuming the 'sub'
instruction enters fetch (F) in cycle 1, in what cycle does the
'xor' instruction enter writeback (W)? Show your work in a pipeline
diagram. (Assume that the register file cannot read and write the
same register in the same cycle and get the new data.)
How
does your answer change if you consider the pipeline in figure 4.60,
on page 375 of COD4e? (Assume that the register file contains
internal bypassing and can read and write the same register in the
same cycle and get the new data.)
Problem
6 – 10 Points
Consider
the pipeline in Figure 4.51 on page 362; assume predict-not-taken
for branches and assume a "Hazard detection unit" in the
ID stage as shown on page 379. Can an attempt to flush and an
attempt to stall occur simultaneously? If so, do they result in
conflicting actions and/or cooperating actions? If there are any
cooperating actions, how do they work together? If there are an
conflicting actions, which should take priority? What would you do
in the design to make sure this works correctly? You may want to
consider the following code sequence to help you answer this
question:
beq $5, $2, loc #assume that the branch is taken
lw $3, 40($4)
add $2, $3, $4
sw $2, 40($4)
loc: or $5, $5, $2
Problem
7 – 15 Points
Consider
a pipeline where branches are predicted not-taken, and a taken
branch introduces three-cycle penalty. Suppose you are considering
adding a delayed branch slot to your instruction set architecture,
so that taken branches would only have a two-cycle penalty. Consider
the following three fragments of code:
Fragment 1:
add $5, $5, $2
beq $5, $6, Target
lw $4, 0($2)
.
.
.
Target: lw $1, 0($7)
...
Fragment 2:
add $5, $5, $2
beq $5, $6, Target
lw $4, 0($7)
.
.
.
Target: sub $4, $8, $3
...
Fragment 3:
movei $2, 21 // End-of-loop count
.
.
.
addi $4, $4, 1
beq $4, $2, Target
.
.
.
Target: ...
Re-arrange
or re-write each of the fragments so that it will work correctly
with a branch delay slot and maximize performance. (The dots
represent an unknown amount of other code that you can't change.)
What is the average number of cycles that were saved or lost in each
case if you used the delayed branch architecture? (Assume branches
are taken 60% of the time.)
While
a good idea at the time, branch delay slots are discouraged in
modern processors with deep pipelines in favor of dynamic branch
predictors. Why do you think this is so? Why would a branch delay
instruction perform poorly in a long pipeline?
Problem
8 – 15 Points
(submit
this problem under inst_test, instead of hw4)
Develop
instruction level tests for your processor. In this problem each of
you will develop a set of small programs that are meant to test
whether your processor implements these instructions correctly. You
will write these programs in assembly, run them on an instruction
emulator to make sure what you wrote is indeed testing the write
thing. The eventual goal is to run these programs on your
processor's verilog implementation and use them to test your
implementation.
Each
of you will be responsible for one instruction and must develop a
set of simple programs for that instructions. The table below gives
the assignment of instructions to students.
aarti
|
addi
|
abrown
|
subi
|
ammar
|
ori
|
asplund
|
andi
|
atishay
|
roli
|
ayoung
|
slli
|
bechard
|
rori
|
brant-ho
|
srai
|
brinsko
|
st
|
capel
|
ld
|
chanson
|
stu
|
cofell
|
add
|
diedrich
|
sub
|
emiller
|
or
|
frederic
|
and
|
frericks
|
rol
|
grigoriy
|
sll
|
halbach
|
ror
|
hang
|
sra
|
hanly
|
seq
|
hao
|
slt
|
hoese
|
sle
|
in
|
sco
|
jalal
|
beqz
|
jastrows
|
bnez
|
jatin
|
lbi
|
jimmy
|
slbi
|
jmartine
|
j
|
joel
|
jr
|
kjell
|
jal
|
klingens
|
jalr
|
langenfe
|
sll
|
markus
|
slt
|
marsh
|
slli
|
martell
|
bnez
|
michlig
|
ori
|
millican
|
sle
|
morrell
|
bgez
|
ndimick
|
srai
|
nystrom
|
st
|
ott
|
ror
|
passofar
|
roli
|
pdickey
|
jalr
|
rezny
|
addi
|
samanas
|
subi
|
schanke
|
add
|
sefiddas
|
sub
|
shourjo
|
jalr
|
soumphol
|
beqz
|
spallett
|
bnez
|
swati
|
sco
|
varun
|
sle
|
vaughn
|
seq
|
weisman
|
slt
|
wilcox
|
sll
|
wyler
|
sra
|
xiaofeng
|
rori
|
To
get you started below are two example tests for the add instruction.
add_0.asm
lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt
add_1.asm
lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt
You
will notice one thing. The add
test
uses the lbi
instruction
also! Your goal while writing these tests is to isolate your
instruction as much as possible and minimize the use of the other
instructions. Identify different corner cases and the common case
for your instruction and develop a set of simple test programs.
The
work flow we will follow is:
Write
test in WISC-SP10 assembly language.
Assemble
using assembler assemble.sh
Simulate
the test in the simulator and make sure your test is doing what you
thought it was doing. Use the simulator:wisccalculator
Below
is a short demo:
prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img loadfile_1.img loadfile_2.img loadfile_3.img loadfile_all.img loadfile.lst
prompt% wiscalculator loadfile_all.img
WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information
Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished
prompt%
The
simulator will print a trace of each instruction along with the
state of the registers. You should examine these to make sure that
your test is indeed doing what is expected. For the st
instruction
you will need to examine memory also.
What
you need to do:
Write
a set of tests for your instruction. Name them
<opcode>_[0,1,2,3,4].asm
Use
your discretion to decide how many tests you need
Identify
corner cases. Think about possible bugs in the hardware.
Write
comments in your assembly code explain what the test is doing
The
goal of this problem is to make sure you understand the ISA and
develop targeted tests for the hardware. Understanding the ISA is
required before building hardware for it!
I
will make all tests available to everyone, so you can use these to
debug and test your verilog implementation. One of the first things,
you must do after putting together your full processor is run each
of these tests and test each individual instruction.
Submit
under “inst_test”: