CS/ECE
552 : Introduction to Computer Architecture Spring
2010 Prof. Wood Problem
Set #5
Due:
April 7th, 2010
Assignment
is split between project group and individual work
Homework is due at start of class
Problems 1 - 2 MUST be done with your project group (handin verilog to “hw5”, report portions on paper)
Problems 3-5 MUST be done ALONE (all paper)
No exceptions to the above handin rules will be allowed.
You must abide by the Verilog file naming conventions
All verilog code must pass Vcheck
Each problem must be in its own directory
If a problem requires files from a different directory, then create a copy of the file in each directory.
1.
Problem 1 -35 Points
(Download
this tarball
for easy access to all required files for problems 1&2.)
You
will implement a hierarchical memory system in Verilog that consists
of a level-1 write-back cache with write-allocate policy and stalling
memory. The system should use a direct-mapped cache and a
four-banked, four-cycle memory. See the project modules provided page
for the Cache
module and four-banked
memory module.
Blocks are 4 words wide and the system is byte-addressable,
word-aligned.
The
top level module that you will develop is as follows. verilog
template source for mem_system.v
module mem_system(/*AUTOARG*/
// Outputs
DataOut, Done, Stall, CacheHit, err,
// Inputs
Addr, DataIn, Rd, Wr, createdump, clk, rst
);
input [15:0] Addr;
input [15:0] DataIn;
input Rd;
input Wr;
input createdump;
input clk;
input rst;
output [15:0] DataOut;
output Done;
output Stall;
output CacheHit;
output err;
/* data_mem = 1, inst_mem = 0 *
* needed for cache parameter */
parameter mem_type = 0;
// <<<your code here>>>
endmodule // mem_system
A
top-level module called mem_system_hier.v is
also provided which instantiates the clock generator
and mem_system inside
it. verilog
source for mem_system_hier.v
The
tarball listed above contains two testbenches, with a reference
memory module, and loadfiles to
initialize your memory.
Important
Notes:
This
module will also handle internal errors.
The
system should ignore new inputs when it is stalled (not an error).
Timing will be an important part of this problem; Designs that stall
longer than necessary will be docked points.
The Done signal
should be asserted for exactly one cycle. If the request can be
satisfied in the same cycle that data should be
presented, Done should
be asserted in that same cycle.
The
CacheHit signal denotes whether the request was a hit in the cache.
The
memtype parameter decides whether this is an instruction or data
cache which is used to generate the names for the dump files.
To
complete this problem, you will need to determine how the internal
components are arranged and will have to create a cache controller
FSM. See the description of the cache module for hints on how this
should be done. You can chose to implement either a Mealy or Moore
machine, although I recommend using a Moore machine as it will likely
be easier to create. Be forewarned that the resulting state machine
will be relatively large so get started early.
The
testing for this module should be extensive. You will need to verify
that the design works correctly during hits, misses, writebacks, and
refills. Also be sure to check the design under various main memory
stall conditions.
For
extra credit, you can improve your performance by adding a two-entry
store buffer so that it is possible for writes to complete in one
cycle. The extra credit will not count if the standard system is not
working, so be sure to thoroughly test your design before thinking
about moving on.
Instantiating
the cache modules:
This
is the methodology I suggest for instantiating your cache modules, so
the naming conventions are the same for everyone.
Somewhere
in mem_system.v:
cache0
(0 + memtype) c0(....)
This
will guarantee that when this module is finally connected to your
processor, your instruction and data memory will create separate dump
files. See the Cache
module section
for details.
Verification
Verification
is an important part and significant challenge for this problem. You
are provided with two testbenches:
mem_system_perfbench.v:
This is a more carefully constructed testbench that is meant to
check the performance of your design. For example, is your cache
reporting cache hits and misses correctly. Are the requests being
serviced in the correct amount of time etc. This testbench takes as
input a memory address trace in a file called mem.addr .
This file must be in the same location as your verilog files. An
example mem.addr file is provided. The format of the file is the
following:
Each
line represents a new request
Each
line has 4 numbers separated by a space
The
numbers are: Wr Rd Addr Value
You
must write different address traces to test your module and prove
that it does implement the cache correct. Determining what to test
and show is an important part of this problem. Carefully document and
show in your homework, what cases you are testing. Pick
representative inputs from this testbench, by examining the
waveforms. You must handin annotated waveforms to prove that your
design works correctly during hits, misses, writebacks, and refills.
To
run this testbench:
wsrun.pl -addr mem.addr mem_system_perfbench *.v
At
least 5 traces are required.
Do
not start randomized testing until you have proven to yourself that
your cache shows some basic functionality.
mem_sytem_randbench.v:
This is a completely randomized testbench that stresses the
functional correctness of your design. You must run this testbench,
which will print a log of addresses and statistics on the number of
hits to the cache. Contact the TA for questions about the testbench.
Your module must pass this testbench. The output log is saved in a
file called transcript .
Copy this file as randbench.log and
include in your handin. Specifically this testbench does the
following:
full
random: 1000 memory requests completely random
small
random: 1000 memory requests restricted to addresses 0 to 2046
sequential
addresss: 1000 memory requests restricted to address 0x08, 0x10,
0x18, 0x20, 0x28, 0x30, 0x38, 0x40
two
sets address: 1000 memory requests to test a two-way set associative
cache. You should get predominantly hits to to the cache.
After
every set of 1000 requests, you will see a message like the
following:
LOG:
Done two_sets_addr Requests: 4001, Cycles: 79688 Hits: 562
To
run this testbench:
wsrun.pl mem_system_randbench *.v
Pay
careful attention to the sequential address and the two sets address
traces. You can look at the testbench to see what sequence of address
this is. You should be able to estimate how many hits your cache
should report on them.
Synthesis
As
with other homeworks you must synthesize your design. You must turn
in the entire synth directory and make sure your total combinational
area is non-zero.
(If
you implemented the extra credit, please make a note.)
Submit
on Paper:
Turn
in neatly and legibly drawn schematics of your design. Your
design hierarchy should be clear in this schematic.
A
state diagram of your cache controller.
State
Transition Table
Fill
in the table for the 8 regression address traces: HW5
Problem1 address trace table
On
the handwritten homework you turn in, fill in the following
synthesis info:
Total
area:
Worst
case slack:
Submit
Electronically: (directory prob1)
All
your verilog source code.
For
the mem_system_randbench testbench,
the log output in a file called randbench.log .
For
the mem_system_perfbench testbench,
all additional trace files your wrote. Each such file should end
with the extension .addr
The
entire synth directory
Make
sure to use mem_system.syn.v, and the 3 report files
(area_report,timing_report,cell_report) are present
Make
sure that in the area report no cell has an area of zero
|
2.
Problem 2 -20 Points
Implement
your 2-way set associative cache which is required for the project.
See the Cache
module page.
Replace your direct-mapped cache in the above problem with this 2-way
set associate cache.
Instantiating
the cache modules
This
is the methodology I suggest for instantiating your cache modules, so
the naming conventions are the same for everyone.
Somewhere
in mem_system.v:
cache0
(0 + memtype) c0(....) cache1 (2 + memtype) c1(....)
This
will guarantee that when this module is finally connected to your
processor, your instruction and data memory will create separate dump
files. See the Cache
module section
for details.
Parameter Value File Names
--------------- ----------
0 Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ...
1 Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ...
2 Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ...
3 Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...
Pay
careful attention to the sequential address and the two sets address
traces from mem_system_randbench. You can look at the testbench to
see what sequence of address this is. You should be able to estimate
how many hits your cache should report on them.
Submit:
Follow
directions for Problem 1
Put
files in a folder called prob2
(on
paper)
Fill
in value for the 6 address traces for the 2-way set associative
cache: HW5
Problem2 address trace table
For
mem_2way7.addr print the text output of simulation with all the
LOG messages. You should use this simulator
described here. These message should denote the hit/miss for
each request. You should explain in handwritten notes the reason
for each each hit/miss
|
#END GROUP
WORK#
3.
Problem 3 – 15 Points
Consider
a direct-mapped cache with 32-byte blocks and a total capacity of 512
bytes in a system with a 32-bit address space. Assume this is a byte
addressable cache.
Indicate
which bits of an address in this machine correspond to the tag,
index, and offset, respectively.
For
the sequence of addresses below, indicate which references will
result in cache hits and which will result in cache misses. If it
does result in a miss, mark whether the miss was a compulsory,
capacity, or conflict miss. Assume the cache is initially empty.
(All valid bits are set to 0)
Show
the final contents of the address tags at the end of execution.
Explain
what can be done to improve each type of miss.
0x0001b596
0x00000ee8
0x00000ef4
0x00004182
0x0000780a
0x0000a690
0x0000408e
0x0001b598
0x00007800
0x00000efc
0x00027c02
0x0000408a
0x00004198
0x00006710
0x0000670c
0x00027c04
0x0001b590
4.
Problem 4 – 15 Points
Re-do
problem 3, but using a two-way set-associative cache. When replacing
a block, the least-recently-used block is chosen to be replaced.
Everything else (block size and total capacity) remains the same.
Determine
the speedup over the direct-mapped cache in problem 3. Assume both
caches can be accessed in 1 cycle, that the CPI without misses is
1.0, and that the miss penalty is 25 cycles.
5.
Problem 5 - 15 Points
Consider
a cache with the following characteristics:
16-byte
blocks
4-way
set associative
512
sets
42-bit
addresses
writeback
LRU
replacement policy
How
many bytes of data storage are there?
What
is the total number of bits needed to implement the cache?
What
percentage of cache storage is overhead?
Make
a picture similar to the one on page 486 of the COD4e. (As with the
picture in the text, include the hit and data logic.)
|