Homework is due at start of class
Problems 1 - 3 MUST be done with your project group (all electronic: handin to “hw4_1, hw4_2 and hw4_3”)
Problems 4 - 7 MUST be done ALONE (all paper)
Problem 8 must also be done ALONE (all electronic: handin to “TBA”)
No exceptions to the above handin rules will be allowed, as this is already unduly complicated (to grade).
You must abide by theVerilog file naming conventions
All verilog code must pass Vcheck
Each problem must be in its own directory
If a problem requires files from a different directory, then create a copy of the file in each directory.
Problem
1 - 10 Points
In
Verilog, create a register file that includes internal bypassing so
that results written in one cycle can be read during the same cycle.
Do this by writing an outer "wrapper" module that
instantiates your existing (unchanged) register file module; your
new module will just add the bypass logic. The list of inputs and
outputs of the outer module should be the same as that of the inner
module. Submit your Verilog source code and your testing results.
Call
this module rf_bypass and it should be in a file called rf_bypass.v
Modify
rf_hier.v from problem3 of HW2 so that it now instantiates rf_bypass
instead of rf.
The
inputs and output interface for rf_bypass.v should be identical to
rf.v
Use
the rf_bypass_bench.v
testbench.
Here are some usage instructions: Usage
instructions.
What
to submit: (Directory name: hw4_1)
Describe
precisely how you augmented your hw3 register file in README.txt
Any
modifications to the testbench if required. If you use the
testbench provided, electronically submit the text output of the
program as rf_bench.out (see 4 below). Modelsim will write the text
output to a file called transcript
in
your project directory.
All
your verilog source code.
Problem
2 – 10 Points
Read the Synthesis Tutorial
Synthesize
your register file from homework
3
Synthesis
will create the synth directory which will include rf.syn.v, area
report, timing report, etc.
What
to submit: (Directory name: hw4_2)
Verilog
files from hw3's register file
Add
the entire synth directory
Make
sure rf.syn.v, and the 4 report files are present (Make sure that
in the area report no cell has an area of zero)
In
the readme, fill in this info:
Total
area
Worst
case slack
Problem
3 – 10 Points
Read the Synthesis Tutorial
Synthesize
your FIFO from homework 3.
Synthesize
will create the synth directory which will include fifo.syn.v, area
report, timing report, etc.
What
to submit: (Directory name: hw4_3)
Verilog
files from hw3's fifo
Add
the entire synth directory
Make
sure fifo.syn.v, and the 4 report files are present (Make sure that
in the area report no cell has an area of zero)
In
the readme, fill in this info:
Total
area
Worst
case slacklack
#end
group work#
Problem
4 – 15 Points
Consider
the following code sequence and the datapath in figure 4.51 on page
362 of COD4e. Assuming the first instruction is fetched in cycle 1
and the branch is not taken, in which cycle does the 'and'
instruction write its value to the register file? What if the branch
IS taken? (Assume no branch prediction). Show pipeline diagrams.
beq $2, $3, foo
add $3, $4, $5
sub $5, $6, $7
or $7, $8, $9
foo: and $5, $6, $7
Problem
5 – 15 Points
For each of the three
MIPS assembly code segments, (a) indicate the dependences and their types, (b) Assuming that there is NO
forwarding in the pipelined processor, indicate hazards and add NOP instructions to eliminate them, (c) Assuming
thre is FULL forwarding in the pipelined processor, indicate hazards and add NOP instructions to eliminate
them.
(a) add $4, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6
(b) lw $1, 40($6)
add $6, $2, $2
sw $6, 50($1)
(c) lw $5, -16($5)
sw $5, -16($5)
add $5, $5, $5
Problem
6 – 10 Points
COD4E - EXCERCISE 4.24.1 - 4.24.3 (PAGE 432) with changes below
(a) Take pattern as T, T, NT, T and (b) Take pattern as T, T, T, NT, NT
Problem
7 – 15 Points
Consider
a pipeline where branches are predicted not-taken, and a taken
branch introduces three-cycle penalty. Suppose you are considering
adding a delayed branch slot to your instruction set architecture,
so that taken branches would only have a two-cycle penalty. Consider
the following three fragments of code:
Fragment 1:
add $5, $5, $2
beq $5, $6, Target
lw $4, 0($2)
.
.
.
Target: lw $1, 0($7)
...
Fragment 2:
add $5, $5, $2
beq $5, $6, Target
lw $4, 0($7)
.
.
.
Target: sub $4, $8, $3
...
Fragment 3:
movei $2, 21 // End-of-loop count
.
.
.
addi $4, $4, 1
beq $4, $2, Target
.
.
.
Target: ...
Re-arrange
or re-write each of the fragments so that it will work correctly
with a branch delay slot and maximize performance. (The dots
represent an unknown amount of other code that you can't change.)
What is the average number of cycles that were saved or lost in each
case if you used the delayed branch architecture? (Assume branches
are taken 60% of the time.)
While
a good idea at the time, branch delay slots are discouraged in
modern processors with deep pipelines in favor of dynamic branch
predictors. Why do you think this is so? Why would a branch delay
instruction perform poorly in a long pipeline?
Problem
8 – 15 Points
Develop
instruction level tests for your processor. In this problem each of
you will develop a set of small programs that are meant to test
whether your processor implements these instructions correctly. You
will write these programs in assembly, run them on an instruction
emulator to make sure what you wrote is indeed testing the write
thing. The eventual goal is to run these programs on your
processor's verilog implementation and use them to test your
implementation.
Each
of you will be responsible for one instruction and must develop a
set of simple programs for that instructions. The table below gives
the assignment of instructions to students.
ahmad
|
addi
|
alexm
|
subi
|
ampomah
|
ori
|
andracek
|
andi
|
ativut
|
roli
|
bayer
|
bltz
|
chao
|
rori
|
chunw
|
srai
|
danielr
|
st
|
davidm
|
ld
|
deblon
|
stu
|
dexter
|
add
|
dimitrio
|
sub
|
dmiller
|
or
|
dragga
|
and
|
eichers
|
rol
|
fessler
|
sll
|
fisher
|
ror
|
foss
|
sra
|
gola
|
seq
|
harter
|
slt
|
harwell
|
sle
|
hittson
|
sco
|
hongzhuo
|
beqz
|
huanchen
|
bnez
|
ishani
|
lbi
|
jaffke
|
slbi
|
jalal
|
j
|
jiaduo
|
jr
|
jitrapon
|
jal
|
jliu
|
jalr
|
jui-chie
|
sll
|
justmann
|
slt
|
katelyn
|
slli
|
kolp
|
bnez
|
kpark38
|
ori
|
kulcyk
|
sle
|
lars
|
bgez
|
little
|
srai
|
mcc
|
st
|
mgm
|
ror
|
mschmid
|
roli
|
nwilliam
|
jalr
|
parvi
|
addi
|
pjohnson
|
subi
|
redderse
|
add
|
roberts
|
sub
|
sato
|
jalr
|
schleife
|
beqz
|
shanpeng
|
bnez
|
shubham
|
sco
|
skobov
|
sle
|
sok
|
seq
|
starr
|
slt
|
suli
|
sll
|
swiercze
|
sra
|
theodor
|
st
|
tong
|
btr
|
vander-p
|
srai
|
van-maas
|
roli
|
weisnich
|
and
|
wysocki
|
rori
|
yaman
|
sll
|
yashashr
|
bltz
|
zxie
|
btr
|
To
get you started below are two example tests for the add instruction.
add_0.asm
lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt
add_1.asm
lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt
You
will notice one thing. The add
test
uses the lbi
instruction
also! Your goal while writing these tests is to isolate your
instruction as much as possible and minimize the use of the other
instructions. Identify different corner cases and the common case
for your instruction and develop a set of simple test programs.
The
work flow we will follow is:
Write
test in WISC-SP12 assembly language.
Assemble
using assembler assemble.sh
Simulate
the test in the simulator and make sure your test is doing what you
thought it was doing. Use the simulator:wisccalculator
Below
is a short demo:
prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img loadfile_1.img loadfile_2.img loadfile_3.img loadfile_all.img loadfile.lst
prompt% wiscalculator loadfile_all.img
WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information
Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished
prompt%
The
simulator will print a trace of each instruction along with the
state of the registers. You should examine these to make sure that
your test is indeed doing what is expected. For the st
instruction
you will need to examine memory also.
What
you need to do:
Write
a set of tests for your instruction. Name them
<opcode>_[0,1,2,3,4].asm
Use
your discretion to decide how many tests you need
Identify
corner cases. Think about possible bugs in the hardware.
Write
comments in your assembly code explain what the test is doing
The
goal of this problem is to make sure you understand the ISA and
develop targeted tests for the hardware. Understanding the ISA is
required before building hardware for it!
I
will make all tests available to everyone, so you can use these to
debug and test your verilog implementation. One of the first things,
you must do after putting together your full processor is run each
of these tests and test each individual instruction.
What to submit
- Physical copy
- Written explanation of what your tests do and justification why your set of tests is comprehensive
Electronic submission instructions
Create a folder
named “prob_inst”:
All your assembly files must be in this directory
Write a set of tests for your instruction. Name them _[0,1,2,3,4].asm
Use your discretion to decide how many tests you need
Identify corner cases. Think about possible bugs in the hardware.
Written
explanation of what your tests do and justification why your set of
tests is comprehensive (Copy paste the contents you prepare manually into a README.txt)
Handin Instructions
Hand in your homework using the CS handin program.
- Make a folder for each problem (hw4_1, hw4_2, hw4_3 and prob_inst)
- Each folder should contain all the verilog files for that problem. prob_inst should contain your .asm files
- name and signals for the top level module should be as indicated for each problem
- tar these 4 folders to 'cs username'.tar [example : tar cvf ram.tar hw4_1 hw4_2 hw4_3 prob_inst]
- Copy the tar file over to an empty folder and submit it using handin documentation [example: mkdir ram; mv ram.tar ram]
* <class_name> cs552-1
* <assignment_name> HW4
* <directory_path> 'location_of_the folder_you_created'