|CS552 Course Wiki: Spring 2016||Main »
Cache Module Specification
On this page... (hide)
Before you implement your cache, you should convert your processor design to use the Stalling Memory.
You will be provided with a set of primitive modules and you must build a direct-mapped and 2-way set associative cache using these modules. The modules you will use are:
Copy all .syn.v files in the same directory as their corresponding .v files.
This figure shows the external interface to the module. Each signal is described in the table below.
+-------------------+ | | enable >------| | index[7:0] >------| cache | offset[2:0] >------| | comp >------| 256 lines |-----> hit write >------| by 4 words |-----> dirty tag_in[4:0] >------| |-----> tag_out[4:0] data_in[15:0] >------| |-----> data_out[15:0] valid_in >------| |-----> valid | | clk >------| | rst >------| |-----> err createdump >------| | +-------------------+
When instantiating the module, there is a parameter which should be set for each instance. When you dump the contents of the cache to a set of files (e.g. for debugging), this parameter allows each instance to go to a unique set of filenames.
Parameter Value File Names --------------- ---------- 0 Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ... 1 Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ... 2 Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ... 3 Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...
Here is an example of instantiating two modules with a parameter value of 0 and 1:
cache #(0) cache0 (enable, index, ... cache #(1) cache1 (enable, index, ...
The cache contains 256 lines. Each line contains one valid bit, one dirty bit, a 5-bit tag, and four 16-bit words:
V D Tag Word 0 Word 1 Word 2 Word 3 ___________________________________________________________________________________ |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| Index-------->|___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________|
Although there are a lot of signals for the cache, its operation is pretty simple. When "enable" is high, the two main control lines are "comp" and "write". Here are the four cases:
This case is used when the processor executes a load instruction. The "tag_in", "index", and "offset" signals need to be valid. Either a hit or a miss will occur, as indicated by the "hit" output during the same cycle. If a hit occurs, "data_out" will contain the data and "valid" will indicate if the data is valid. If a miss occurs, the "valid" output will indicate whether the block occupying that line of the cache is valid. The "dirty" output indicates the state of the dirty bit in the cache line.
This case occurs when the processor executes a store instruction. The "data_in", "tag_in", "index", and "offset" lines need to be valid. Either a hit or a miss will occur as indicated by the "hit" output during the same cycle. If there is a miss, the cache state will not be modified. If there is a hit, the word will be written at the rising edge of the clock, and the dirty bit of the cache line will be written to "1". (The "dirty" output is not meaningful as this is a write cycle for that bit.) NOTE: On a hit, you also need to look at the "valid" output! If there is a hit, but the line is not valid, you should treat it as a miss; the other word of the line will not be valid and you will not want to leave the cache in that state.
On a miss, the "valid" output will indicate whether the block occupying that line of the cache is valid. The dirty bit will be read, and will indicate whether or not the block occupying that line is dirty. On the other hand, if "hit" is true while "write" and "comp" are true, "dirty" output is not meaningful and will remain zero (because the dirty bit of the cache was performing a write).
This case occurs when you want to read the tag and the data out of the cache memory. You will need to do this when a cache line is victimized, to see if the cache line is dirty and to write it back to memory if necessary. With "comp"=0, the cache basically acts like a RAM. The "index" and "offset" inputs need to be valid to select what to read. The "data_out", "tag_out", "valid", and "dirty" outputs will be valid during the same cycle.
This case occurs when you bring in data from memory and need to store it in the cache. The "index", "offset", "tag_in", "valid_in" and "data_in" signals need to be valid. On the rising edge of the clock, the values will be written into the specified cache line. Also, the dirty bit will be set to zero.
After you have a working design using a direct-mapped cache, you will add a second cache module to make your design two-way set-associative. Here are the four cases again:
The "index" and "offset" inputs need to be driven to both cache modules. There is a hit if either hit output goes high. Use one of the hit outputs as a select for a mux between the two data outputs. If there is a miss, decide which cache module to victimize based on this logic: If one is valid, select the other one. If neither is valid, select way zero. If both are valid, use the pseudo-random replacement algorithm specified below.
The "index", "offset", and "data" inputs need to be driven to both cache modules. There is a hit if either hit output goes high. Note that only one cache will get written as long as your design ensures that no line can be present in both cache modules.
After deciding which cache module to victimize, use that select bit to mux the data, valid, and dirty bits from the two cache modules.
Drive the "index", "offset", "data" and "valid" inputs to both cache modules. Make sure only the correct module has its write input asserted.
In order to make the designs more deterministic and easier to grade, all set-associative caches must implement the following replacement algorithm:
Example, using two sets:
start with victimway = 0 load 0x1000 victimway=1; install 0x1000 in way 0 because both free load 0x1010 victimway=0; install 0x1010 in way 0 because both free load 0x1000 victimway=1; hit load 0x2010 victimway=0; install 0x2010 in way 1 because it's free load 0x2000 victimway=1; install 0x2000 in way 1 because it's free load 0x3000 victimway=0; install 0x3000 in way 0 (=victimway) load 0x3010 victimway=1; install 0x3010 in way 1 (=victimway)
|Page last modified on April 15, 2013, visited 856 times|