|CS552 Course Wiki: Spring 2017||Main »
On this page... (hide)
Your final processor you design for this course will use both instruction and data caches. For this stage of the project you will be designing and testing a cache to ultimately be used for your final design. You must first design and verify a direct mapped cache before making changes to create a two-way set-associative cache.
The cache's storage as well as the memory has already been designed for you. You will be implementing the memory system controller to effectively manage the cache.
All needed files are included in the original project tar file.
The following files are included in the cache directories (cache_*/). The 'mem_system.v' should be the only file you need to edit.
Four Banked Memory is a better representation of a modern memory system. It breaks the memory into multiple banks. The four-cycle, four-banked memory is broken into two Verilog modules, the top level four_bank_mem.v and single banks final_memory.v. All needed files were included in the project tar file.
final_memory.syn.v must be in the same directory as final_memory.v
+-------------------+ | | Addr[15:0] >------| four_bank_mem | DataIn[15:0] >------| | wr >------| 64KB |-----> DataOut[15:0] rd >------| |-----> stall | |-----> Busy[3:0] clk >------| |-----> err rst >------| | createdump >------| | +-------------------+
| | | | | | | addr | addr etc | read data | | new addr | | data_in | OK to any | available | | etc. is | | wr, rd |*diffferent*| | | OK to | | enable | bank | | | *same* | | | | | | bank | <----bank busy; any new request to---> the *same* bank will stall
This figure shows the external interface to the module. Each signal is described in the table.
This is a byte-aligned, word-addressable 16-bit wide 64K-byte memory.
Requests may be presented every cycle. They will be directed to one of the four banks depending on the least significant 2 bits of the address.
Two requests to the same bank which are closer than cycles N and N+4 will result in the second request not happening, and a "stall" output being generated.
Busy output reflects the current status of each individual bank.
Concurrent read and write not allowed.
On reset, memory loads from file "loadfile_0.img", "loadfile_1.img", "loadfile_2.img", and "loadfile_3.img". Each file supplies every fourth word. (The latest version of the assembler generates these four files.)
Format of each file: @0 <hex data 0> <hex data 1> ...etc
If input "create_dump" is true on rising clock, contents of memory will be dumped to file "dumpfile_0", "dumpfile_1", etc. Each file will be a dump from location 0 up through the highest location modified by a write in that bank.
This figure shows the external interface to the module. Each signal is described in the table below.
+-------------------+ | | enable >------| | index[7:0] >------| cache | offset[2:0] >------| | comp >------| 256 lines |-----> hit write >------| by 4 words |-----> dirty tag_in[4:0] >------| |-----> tag_out[4:0] data_in[15:0] >------| |-----> data_out[15:0] valid_in >------| |-----> valid | | clk >------| | rst >------| |-----> err createdump >------| | +-------------------+
The cache contains 256 lines. Each line contains one valid bit, one dirty bit, a 5-bit tag, and four 16-bit words:
V D Tag Word 0 Word 1 Word 2 Word 3 ___________________________________________________________________________________ |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| Index-------->|___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________|
You will need to determine how your cache is arranged and functions before starting implementation. Draw out the state machine for your cache controller as this will be required. You may implement either a Mealy or Moore machine though a Moore machine is recommended as it will likely be easier. Be forewarned that the resulting state machine will be relatively large so it is best to start early.
The state machine diagram is due a week before the cache demo to learn@UW. If we have concerns about your design we will ask you to setup an appointment to talk about your FSM design before the due date.
You will initially need to implement your cache as a direct mapped cache. Make your changes for this problem in the "cache_direct" directory.
Although there are a lot of signals for the cache, its operation is pretty simple. When "enable" is high, the two main control lines are "comp" and "write". Here are the four cases for the behavior of the direct mapped cache:
On a miss, the "valid" output will indicate whether the block occupying that line of the cache is valid. The dirty bit will be read, and will indicate whether or not the block occupying that line is dirty. On the other hand, if "hit" is true while "write" and "comp" are true, "dirty" output is not meaningful and will remain zero (because the dirty bit of the cache was performing a write).
To begin testing you will use address traces that you will create to target the different possible aspects of cache behavior. Once you have that fully working you can use a fully random test set.
The perfbench testbench uses address trace files that describe a sequence of reads and writes. You will need to write several (at least 5) address traces to test your cache and the various behavior cases that might occur. You should try to make it so that your traces highlight the various use cases that your cache might experience to be sure that they are working.
An example address trace file (mem.addr) is provided. The format of the file is the following:
Once you have created your address traces this testbench can be run as such:
If it correctly runs you will get output that looks like the following:
# Using trace file mem.addr # LOG: ReQNum 1 Cycle 12 ReqCycle 3 Wr Addr 0x015c Value 0x0018 ValueRef 0x0018 HIT 0 # # LOG: ReqNum 2 Cycle 14 ReqCycle 12 Rd Addr 0x015c Value 0x0018 ValueRef 0x0018 HIT 1 # # LOG: Done all Requests: 2 Replies: 2 Cycles: 14 Hits: 1 # Test status: SUCCESS # Break at mem_system_perfbench.v line 200 # Stopped at mem_system_perfbench.v line 200
Be aware that just because a SUCCESS message is received it does not guarantee your cache is working correctly. You should use the cache simulator to verify the correct behavior is happening. The cache simulator can be run as follows:
cachesim <associativity> <size_bytes> <block_size_bytes> <trace_file>
So for this problem you would use:
cachesim 1 2048 8 mem.addr
This will generate output like the following:
Store Miss for Address 348 Load Hit for Address 348
You should then compare this to the perfbench output to make sure they both exhibit the same behavior.
The address traces you created should be put in the 'cache_direct/verification' directory and have the '.addr' extention.
Once you are confident that your design is working you should test it using the random testbench. The random bench does the following:
At the end of each section you will see a message showing the performance like the following:
LOG: Done two_sets_addr Requests: 4001, Cycles: 79688 Hits: 562
You can run the random testbench like this:
wsrun.pl mem_system_randbench *.v
This will ultimately print a message saying either:
# Test status: SUCCESS
# Test status: FAIL
Keep in mind that it's considered a success if the correct data is returned every time but that doesn't mean your cache is necessarily working. If you have no hits or a very small number of them something is still wrong. If you are seeing failures try to isolate the case that is causing the issues and create a trace that generates the same behavior to make debugging easier.
You will need to run synthesis on your direct mapped cache and verify that it does not produce any errors. You should turn in all of the reports generated by synthesis.
Your synthesis results should be placed in the 'cache_direct/synthesis' directory.
You should not start on this until you have implemented and fully verified your direct-mapped cache.
Remember to change directories to the cache_assoc directory before starting to make changes to your design as you will need to submit both designs. Be aware that the second cache module is instantiated slightly differently before copying your mem_system file and overwriting the provided file.
After you have a working design using a direct-mapped cache, you will add a second cache module to make your design two-way set-associative. Here are the four cases again:
In order to make the designs more deterministic and easier to grade, all set-associative caches must implement the following pseudo-random replacement algorithm:
Example, using two sets:
start with victimway = 0 load 0x1000 victimway=1; install 0x1000 in way 0 because both free load 0x1010 victimway=0; install 0x1010 in way 0 because both free load 0x1000 victimway=1; hit load 0x2010 victimway=0; install 0x2010 in way 1 because it's free load 0x2000 victimway=1; install 0x2000 in way 1 because it's free load 0x3000 victimway=0; install 0x3000 in way 0 (=victimway) load 0x3010 victimway=1; install 0x3010 in way 1 (=victimway)
Your testing for the set-associative cache should be done in much the same way. You can either create more address traces or update your previous ones to reflect the differences in behavior the new design would have. Remember to get your perfbench tests working before attempting to debug the randbench.
The cache simulator would now be run with slightly different arguments to reflect your changes:
cachesim 2 4096 8 mem.addr pseudoRandom
If you do not specify the pseudoRandom argument it will use an LRU replacement policy instead of the pseudo-random policy you have implemented.
The address traces you used should be put in the 'cache_assoc/verification' directory and have the '.addr' extention.
You will also need to synthesize your set-associative cache. You should turn in all of the reports generated by synthesis.
Your synthesis results should be placed in the 'cache_assoc/synthesis' directory.
Instantiating cache modules
When instantiating the module, there is a parameter which is set for each instance. When you dump the contents of the cache to a set of files (e.g. for debugging), this parameter allows each instance to go to a unique set of filenames.
Parameter Value File Names --------------- ---------- 0 Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ... 1 Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ... 2 Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ... 3 Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...
Here is an example of instantiating two modules with a parameter value of 0 and 1:
cache #(0) cache0 (enable, index, ... cache #(1) cache1 (enable, index, ...
|Page last modified on April 15, 2017, visited 592 times|