Links to specific parts of document:
Before you implement your cache, you should convert your processor design to use the Stalling Memory.
You will be provided with a set of primitive modules and you must build a direct-mapped and 2-way set associative cache using these modules. The modules you will use are (see tar file or Github for these files):
Copy all .syn.v files in the same directory as their corresponding .v files (NOTE: Since you are using the provided tar, this should already work for you).
This figure shows the external interface to the module. Each signal is described in the table below.
+-------------------+ | | enable >------| | index[7:0] >------| cache | offset[2:0] >------| | comp >------| 256 lines |-----> hit write >------| by 4 words |-----> dirty tag_in[4:0] >------| |-----> tag_out[4:0] data_in[15:0] >------| |-----> data_out[15:0] valid_in >------| |-----> valid | | clk >------| | rst >------| |-----> err createdump >------| | +-------------------+
Signal | In/Out | Width | Description |
enable | In | 1 | Enable cache. Active high. If low, "write" and "comp" have no effect, and all outputs are zero. |
index | In | 8 | The address bits used to index into the cache memory. |
offset | In | 3 | offset[2:1] selects which word to access in the cache line. The least significant bit should be 0 for word alignment. If the least significant bit is 1, it is an error condition. |
comp | In | 1 | Compare. When "comp"=1, the cache will compare tag_in to the tag of the selected line and indicate if a hit has occurred; the data portion of the cache is read or written but writes are suppressed if there is a miss. When "comp"=0, no compare is done and the Tag and Data portions of the cache will both be read or written. |
write | In | 1 | Write signal. If high at the rising edge of the clock, a write is performed to the data selected by "index" and "offset", and (if "comp"=0) to the tag selected by "index". |
tag_in | In | 5 | When "comp"=1, this field is compared against stored tags to see if a hit occurred; when "comp"=0 and "write"=1 this field is written into the tag portion of the array. |
data_in | In | 16 | On a write, the data that is to be written to the location specified by the "index" and "offset" inputs. |
valid_in | In | 1 | On a write when "comp"=0, the data that is to be written to valid bit at the location specified by the "index" input. |
clk | In | 1 | Clock signal; rising edge active. |
rst | In | 1 | Reset signal. When "rst"=1 on the rising edge of the clock, all lines are marked invalid. (The rest of the cache state is not initialized and may contain X's.) |
createdump | In | 1 | Write contents of entire cache to memory file. Active on rising edge. |
hit | Out | 1 | Goes high during a compare if the tag at the location specified by the "index" lines matches the "tag_in" lines. |
dirty | Out | 1 | When this bit is read, it indicates whether this cache line has been written to. It is valid on a read cycle, and also on a compare-write cycle when hit is false. On a write with "comp"=1, the cache sets the dirty bit to 1. On a write with "comp"=0, the dirty bit is reset to 0. |
tag_out | Out | 5 | When "write"=0, the tag selected by "index" appears on this output. (This value is needed during a writeback.) |
data_out | Out | 16 | When "write"=0, the data selected by "index" and "offset" appears on this output. |
valid | Out | 1 | During a read, this output indicates the state of the valid bit in the selected cache line. |
When instantiating the module, there is a parameter which should be set for each instance. When you dump the contents of the cache to a set of files (e.g. for debugging), this parameter allows each instance to go to a unique set of filenames.
Parameter Value File Names --------------- ---------- 0 Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ... 1 Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ... 2 Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ... 3 Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...
Here is an example of instantiating two modules with a parameter value of 0 and 1:
cache #(0) cache0 (enable, index, ... cache #(1) cache1 (enable, index, ...
The cache contains 256 lines. Each line contains one valid bit, one dirty bit, a 5-bit tag, and four 16-bit words:
V D Tag Word 0 Word 1 Word 2 Word 3 ___________________________________________________________________________________ |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| Index-------->|___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________| |___|___|_______|________________|________________|________________|________________|
Although there are a lot of signals for the cache, its operation is pretty simple. When "enable" is high, the two main control lines are "comp" and "write". Here are the four cases:
This case is used when the processor executes a load instruction. The "tag_in", "index", and "offset" signals need to be valid. Either a hit or a miss will occur, as indicated by the "hit" output during the same cycle. If a hit occurs, "data_out" will contain the data and "valid" will indicate if the data is valid. If a miss occurs, the "valid" output will indicate whether the block occupying that line of the cache is valid. The "dirty" output indicates the state of the dirty bit in the cache line.
This case occurs when the processor executes a store instruction. The "data_in", "tag_in", "index", and "offset" lines need to be valid. Either a hit or a miss will occur as indicated by the "hit" output during the same cycle. If there is a miss, the cache state will not be modified. If there is a hit, the word will be written at the rising edge of the clock, and the dirty bit of the cache line will be written to "1". (The "dirty" output is not meaningful as this is a write cycle for that bit.) NOTE: On a hit, you also need to look at the "valid" output! If there is a hit, but the line is not valid, you should treat it as a miss; the other word of the line will not be valid and you will not want to leave the cache in that state.
On a miss, the "valid" output will indicate whether the block occupying that line of the cache is valid. The dirty bit will be read, and will indicate whether or not the block occupying that line is dirty. On the other hand, if "hit" is true while "write" and "comp" are true, "dirty" output is not meaningful and will remain zero (because the dirty bit of the cache was performing a write).
This case occurs when you want to read the tag and the data out of the cache memory. You will need to do this when a cache line is victimized, to see if the cache line is dirty and to write it back to memory if necessary. With "comp"=0, the cache basically acts like a RAM. The "index" and "offset" inputs need to be valid to select what to read. The "data_out", "tag_out", "valid", and "dirty" outputs will be valid during the same cycle.
This case occurs when you bring in data from memory and need to store it in the cache. The "index", "offset", "tag_in", "valid_in" and "data_in" signals need to be valid. On the rising edge of the clock, the values will be written into the specified cache line. Also, the dirty bit will be set to zero.
After you have a working design using a direct-mapped cache, you will add a second cache module to make your design two-way set-associative. Here are the four cases again:
The "index" and "offset" inputs need to be driven to both cache modules. There is a hit if either hit output goes high. Use one of the hit outputs as a select for a mux between the two data outputs. If there is a miss, decide which cache module to victimize based on this logic: If one is valid, select the other one. If neither is valid, select way zero. If both are valid, use the pseudo-random replacement algorithm specified below.
The "index", "offset", and "data" inputs need to be driven to both cache modules. There is a hit if either hit output goes high. Note that only one cache will get written as long as your design ensures that no line can be present in both cache modules.
After deciding which cache module to victimize, use that select bit to mux the data, valid, and dirty bits from the two cache modules.
Drive the "index", "offset", "data" and "valid" inputs to both cache modules. Make sure only the correct module has its write input asserted.
In order to make the designs more deterministic and easier to grade, all set-associative caches must implement the following replacement algorithm:
Example, using two sets:
start with victimway = 0 load 0x1000 victimway=1; install 0x1000 in way 0 because both free load 0x1010 victimway=0; install 0x1010 in way 0 because both free load 0x1000 victimway=1; hit load 0x2010 victimway=0; install 0x2010 in way 1 because it's free load 0x2000 victimway=1; install 0x2000 in way 1 because it's free load 0x3000 victimway=0; install 0x3000 in way 0 (=victimway) load 0x3010 victimway=1; install 0x3010 in way 1 (=victimway)