CS/ECE 552 Intro to Computer Architecture Spring 2020 Section 1
Instructor Matthew D. Sinclair
URL: http://www.cs.wisc.edu/~sinclair/courses/cs552/spring2020/

Cache Module

Links to specific parts of document:

1.  Overview

Before you implement your cache, you should convert your processor design to use the Stalling Memory.

You will be provided with a set of primitive modules and you must build a direct-mapped and 2-way set associative cache using these modules. The modules you will use are (see tar file or Github for these files):

  1. Basic cache modules: cache.v, memc.v, memv.v
  2. Synthesizable versions: memc.syn.v, memv.syn.v
  3. Four bank memory module top level: four_bank_mem.v
  4. One memory bank: final_memory.v, synthesizable version: final_memory.syn.v

Copy all .syn.v files in the same directory as their corresponding .v files (NOTE: Since you are using the provided tar or Github Classroom, this should already work for you).

2.  Cache Interface

This figure shows the external interface to the module. Each signal is described in the table below.

                     +-------------------+
                     |                   |
       enable >------|                   |
   index[7:0] >------|    cache          |
  offset[2:0] >------|                   |
         comp >------|    256 lines      |-----> hit
        write >------|    by 4 words     |-----> dirty
  tag_in[4:0] >------|                   |-----> tag_out[4:0]
data_in[15:0] >------|                   |-----> data_out[15:0]
     valid_in >------|                   |-----> valid
                     |                   |
          clk >------|                   |
          rst >------|                   |-----> err
   createdump >------|                   |
                     +-------------------+

		
Signal In/Out Width Description
enable In 1 Enable cache. Active high. If low, "write" and "comp" have no effect, and all outputs are zero.
index In 8 The address bits used to index into the cache memory.
offset In 3 offset[2:1] selects which word to access in the cache line. The least significant bit should be 0 for word alignment. If the least significant bit is 1, it is an error condition.
comp In 1 Compare. When "comp"=1, the cache will compare tag_in to the tag of the selected line and indicate if a hit has occurred; the data portion of the cache is read or written but writes are suppressed if there is a miss. When "comp"=0, no compare is done and the Tag and Data portions of the cache will both be read or written.
write In 1 Write signal. If high at the rising edge of the clock, a write is performed to the data selected by "index" and "offset", and (if "comp"=0) to the tag selected by "index".
tag_in In 5 When "comp"=1, this field is compared against stored tags to see if a hit occurred; when "comp"=0 and "write"=1 this field is written into the tag portion of the array.
data_in In 16 On a write, the data that is to be written to the location specified by the "index" and "offset" inputs.
valid_in In 1 On a write when "comp"=0, the data that is to be written to valid bit at the location specified by the "index" input.
clk In 1 Clock signal; rising edge active.
rst In 1 Reset signal. When "rst"=1 on the rising edge of the clock, all lines are marked invalid. (The rest of the cache state is not initialized and may contain X's.)
createdump In 1 Write contents of entire cache to memory file. Active on rising edge.
hit Out 1 Goes high during a compare if the tag at the location specified by the "index" lines matches the "tag_in" lines.
dirty Out 1 When this bit is read, it indicates whether this cache line has been written to. It is valid on a read cycle, and also on a compare-write cycle when hit is false. On a write with "comp"=1, the cache sets the dirty bit to 1. On a write with "comp"=0, the dirty bit is reset to 0.
tag_out Out 5 When "write"=0, the tag selected by "index" appears on this output. (This value is needed during a writeback.)
data_out Out 16 When "write"=0, the data selected by "index" and "offset" appears on this output.
valid Out 1 During a read, this output indicates the state of the valid bit in the selected cache line.

3.  Instantiating cache modules

When instantiating the module, there is a parameter which should be set for each instance. When you dump the contents of the cache to a set of files (e.g. for debugging), this parameter allows each instance to go to a unique set of filenames.

		  Parameter Value     File Names
		  ---------------     ----------
		  0                   Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ...
		  1                   Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ...
		  2                   Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ...
		  3                   Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...

		

Here is an example of instantiating two modules with a parameter value of 0 and 1:

			cache #(0) cache0 (enable, index, ...
			cache #(1) cache1 (enable, index, ...

		  

4.  Organization of the cache

The cache contains 256 lines. Each line contains one valid bit, one dirty bit, a 5-bit tag, and four 16-bit words:


		  V   D    Tag        Word 0           Word 1           Word 2           Word 3
                 ___________________________________________________________________________________
                 |___|___|_______|________________|________________|________________|________________|
                 |___|___|_______|________________|________________|________________|________________|
                 |___|___|_______|________________|________________|________________|________________|
                 |___|___|_______|________________|________________|________________|________________|
   Index-------->|___|___|_______|________________|________________|________________|________________|
                 |___|___|_______|________________|________________|________________|________________|
                 |___|___|_______|________________|________________|________________|________________|
                 |___|___|_______|________________|________________|________________|________________|


		

5.  Cache operation and semantics (Direct-mapped cache)

Although there are a lot of signals for the cache, its operation is pretty simple. When "enable" is high, the two main control lines are "comp" and "write". Here are the four cases:

5.1  Compare Read (comp = 1, write = 0)

This case is used when the processor executes a load instruction. The "tag_in", "index", and "offset" signals need to be valid. Either a hit or a miss will occur, as indicated by the "hit" output during the same cycle. If a hit occurs, "data_out" will contain the data and "valid" will indicate if the data is valid. If a miss occurs, the "valid" output will indicate whether the block occupying that line of the cache is valid. The "dirty" output indicates the state of the dirty bit in the cache line.

5.2  Compare Write (comp = 1, write = 1)

This case occurs when the processor executes a store instruction. The "data_in", "tag_in", "index", and "offset" lines need to be valid. Either a hit or a miss will occur as indicated by the "hit" output during the same cycle. If there is a miss, the cache state will not be modified. If there is a hit, the word will be written at the rising edge of the clock, and the dirty bit of the cache line will be written to "1". (The "dirty" output is not meaningful as this is a write cycle for that bit.) NOTE: On a hit, you also need to look at the "valid" output! If there is a hit, but the line is not valid, you should treat it as a miss; the other word of the line will not be valid and you will not want to leave the cache in that state.

On a miss, the "valid" output will indicate whether the block occupying that line of the cache is valid. The dirty bit will be read, and will indicate whether or not the block occupying that line is dirty. On the other hand, if "hit" is true while "write" and "comp" are true, "dirty" output is not meaningful and will remain zero (because the dirty bit of the cache was performing a write).

5.3  Access Read (comp = 0, write = 0)

This case occurs when you want to read the tag and the data out of the cache memory. You will need to do this when a cache line is victimized, to see if the cache line is dirty and to write it back to memory if necessary. With "comp"=0, the cache basically acts like a RAM. The "index" and "offset" inputs need to be valid to select what to read. The "data_out", "tag_out", "valid", and "dirty" outputs will be valid during the same cycle.

5.4  Access Write (comp = 0, write = 1)

This case occurs when you bring in data from memory and need to store it in the cache. The "index", "offset", "tag_in", "valid_in" and "data_in" signals need to be valid. On the rising edge of the clock, the values will be written into the specified cache line. Also, the dirty bit will be set to zero.

6.  Building a two-way set-associative cache

After you have a working design using a direct-mapped cache, you will add a second cache module to make your design two-way set-associative. Here are the four cases again:

6.1  Compare Read (comp = 1, write = 0)

The "index" and "offset" inputs need to be driven to both cache modules. There is a hit if either hit output goes high. Use one of the hit outputs as a select for a mux between the two data outputs. If there is a miss, decide which cache module to victimize based on this logic: If one is valid, select the other one. If neither is valid, select way zero. If both are valid, use the pseudo-random replacement algorithm specified below.

6.2  Compare Write (comp = 1, write = 1)

The "index", "offset", and "data" inputs need to be driven to both cache modules. There is a hit if either hit output goes high. Note that only one cache will get written as long as your design ensures that no line can be present in both cache modules.

6.3  Access Read (comp = 0, write = 0)

After deciding which cache module to victimize, use that select bit to mux the data, valid, and dirty bits from the two cache modules.

6.4  Access Write (comp = 0, write = 1)

Drive the "index", "offset", "data" and "valid" inputs to both cache modules. Make sure only the correct module has its write input asserted.

7.  Pseudo-Random replacement for 2-way set-associative cache

In order to make the designs more deterministic and easier to grade, all set-associative caches must implement the following replacement algorithm:

  1. Have a flipflop called "victimway" which is intialized to zero.
  2. On each read or write of the cache, invert the state of victimway.
  3. When installing a line after a cache miss, install in an invalid block if possible. If both ways are invalid, install in way zero.
  4. If both ways are valid, and a block must be victimized, use victimway (after already being inverted for this access) to indicate which way to use.
  5. For the D cache, do not invert victimway for instructions that do not read or write cache, or for invalid instructions, or for instructions that are squashed due to branch misprediction.
  6. For the I cache, invert victimway for each instruction fetched.

Example, using two sets:

			start with victimway = 0
			load 0x1000    victimway=1; install 0x1000 in way 0 because both free
			load 0x1010    victimway=0; install 0x1010 in way 0 because both free
			load 0x1000    victimway=1; hit
			load 0x2010    victimway=0; install 0x2010 in way 1 because it's free
			load 0x2000    victimway=1; install 0x2000 in way 1 because it's free
			load 0x3000    victimway=0; install 0x3000 in way 0 (=victimway)
			load 0x3010    victimway=1; install 0x3010 in way 1 (=victimway)
		  

 
Computer Sciences | UW Home