UW-Madison
Computer Sciences Dept.

CS/ECE 552 Introduction to Computer Architecture


Spring 2012 Section 1
Instructor David A. Wood and T. A. Ramkumar Ravikumar
URL: http://www.cs.wisc.edu/~david/courses/cs552/S12/

Cache Module



1.  Overview

Before you implement your cache, you should convert your processor design to use the Stalling Memory.

You will be provided with a set of primitive modules and you must build a direct-mapped and 2-way set associative cache using these modules. The modules you will use are:

  1. Basic cache module. cache.v verilog source codememc.v verilog source codememv.v verilog source code

  2. Synthesizable versions: memc.syn.v verilog source codememv.syn.v verilog source code

  3. Four bank memory module top level. verilog source code

  4. One memory bank. verilog source codesynthesizable version

Copy all .syn.v files in the same directory as their corresponding .v files.

2.  Cache Interface

This figure shows the external interface to the module. Each signal is described in the table below.

                         +-------------------+
                         |                   |
           enable >------|                   |
       index[7:0] >------|    cache          |
        word[1:0] >------|                   |
             comp >------|    256 lines      |-----> hit
            write >------|    by 4 words     |-----> dirty
      tag_in[4:0] >------|                   |-----> tag_out[4:0]
    data_in[15:0] >------|                   |-----> data_out[15:0]
         valid_in >------|                   |-----> valid
                         |                   |
              clk >------|                   |
              rst >------|                   |
       createdump >------|                   |
                         +-------------------+



Signal

In/Out

Width

Description

enable

In

1

Enable cache. Active high. If low, "write" and "comp" have no effect, and all outputs are zero.

index

In

8

The address bits used to index into the cache memory.

word

In

2 (3)

Selects which word to access in the cache line.

comp

In

1

Compare. When "comp"=1, the cache will compare tag_in to the tag of the selected line and indicate if a hit has occurred; the data portion of the cache is read or written but writes are suppressed if there is a miss. When "comp"=0, no compare is done and the Tag and Data portions of the cache will both be read or written.

write

In

1

Write signal. If high at the rising edge of the clock, a write is performed to the data selected by "index" and "word", and (if "comp"=0) to the tag selected by "index".

tag_in

In

5

When "comp"=1, this field is compared against stored tags to see if a hit occurred; when "comp"=0 and "write"=1 this field is written into the tag portion of the array.

data_in

In

16

On a write, the data that is to be written to the location specified by the "index" and "word" inputs.

valid_in

In

1

On a write when "comp"=0, the data that is to be written to valid bit at the location specified by the "index" input.

clk

In

1

Clock signal; rising edge active.

rst

In

1

Reset signal. When "rst"=1 on the rising edge of the clock, all lines are marked invalid. (The rest of the cache state is not initialized and may contain X's.)

createdump

In

1

Write contents of entire cache to memory file. Active on rising edge.

hit

Out

1

Goes high during a compare if the tag at the location specified by the "index" lines matches the "tag_in" lines.

dirty

Out

1

When this bit is read, it indicates whether this cache line has been written to. It is valid on a read cycle, and also on a compare-write cycle when hit is false. On a write with "comp"=1, the cache sets the dirty bit to 1. On a write with "comp"=0, the dirty bit is reset to 0.

tag_out

Out

5

When "write"=0, the tag selected by "index" appears on this output. (This value is needed during a writeback.)

data_out

Out

16

When "write"=0, the data selected by "index" and "word" appears on this output.

valid

Out

1

During a read, this output indicates the state of the valid bit in the selected cache line.

3.  Instantiating cache modules

When instantiating the module, there is a parameter which should be set for each instance. When you dump the contents of the cache to a set of files (e.g. for debugging), this parameter allows each instance to go to a unique set of filenames.

        Parameter Value     File Names
        ---------------     ----------
               0            Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ...
               1            Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ...
               2            Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ...
               3            Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...

Here is an example of instantiating two modules with a parameter value of 0 and 1:

         cache #(0) cache0 (enable, index, ...
         cache #(1) cache1 (enable, index, ...

4.  Organization of the cache

The cache contains 256 lines. Each line contains one valid bit, one dirty bit, a 5-bit tag, and four 16-bit words:

               V   D    Tag        Word 1           Word 0           Word 2           Word 3
               ___________________________________________________________________________________  
              |___|___|_______|________________|________________|________________|________________|
              |___|___|_______|________________|________________|________________|________________|
              |___|___|_______|________________|________________|________________|________________|
              |___|___|_______|________________|________________|________________|________________|
Index------>|___|___|_______|________________|________________|________________|________________|
              |___|___|_______|________________|________________|________________|________________| 
              |___|___|_______|________________|________________|________________|________________|
              |___|___|_______|________________|________________|________________|________________|                                      

5.  Cache operation and semantics (Direct-mapped cache)

Although there are a lot of signals for the cache, its operation is pretty simple. When "enable" is high, the two main control lines are "comp" and "write". Here are the four cases:

5.1  Compare Read (comp = 1, write = 0)

This case is used when the processor executes a load instruction. The "tag_in", "index", and "word" signals need to be valid. Either a hit or a miss will occur, as indicated by the "hit" output during the same cycle. If a hit occurs, "data_out" will contain the data and "valid" will indicate if the data is valid. If a miss occurs, the "valid" output will indicate whether the block occupying that line of the cache is valid. The "dirty" output indicates the state of the dirty bit in the cache line.

5.2  Compare Write (comp = 1, write = 1)

This case occurs when the processor executes a store instruction. The "data_in", "tag_in", "index", and "word" lines need to be valid. Either a hit or a miss will occur as indicated by the "hit" output during the same cycle. If there is a miss, the cache state will not be modified. If there is a hit, the word will be written at the rising edge of the clock, and the dirty bit of the cache line will be written to "1". (The "dirty" output is not meaningful as this is a write cycle for that bit.) NOTE: On a hit, you also need to look at the "valid" output! If there is a hit, but the line is not valid, you should treat it as a miss; the other word of the line will not be valid and you will not want to leave the cache in that state.

On a miss, the "valid" output will indicate whether the block occupying that line of the cache is valid. The dirty bit will be read, and will indicate whether or not the block occupying that line is dirty. On the other hand, if "hit" is true while "write" and "comp" are true, "dirty" output is not meaningful and will remain zero (because the dirty bit of the cache was performing a write).

5.3  Access Read (comp = 0, write = 0)

This case occurs when you want to read the tag and the data out of the cache memory. You will need to do this when a cache line is victimized, to see if the cache line is dirty and to write it back to memory if necessary. With "comp"=0, the cache basically acts like a RAM. The "index" and "word" inputs need to be valid to select what to read. The "data_out", "tag_out", "valid", and "dirty" outputs will be valid during the same cycle.

5.4  Access Write (comp = 0, write = 1)

This case occurs when you bring in data from memory and need to store it in the cache. The "index", "word", "tag_in", "valid_in" and "data_in" signals need to be valid. On the rising edge of the clock, the values will be written into the specified cache line. Also, the dirty bit will be set to zero.

6.  Building a two-way set-associative cache

After you have a working design using a direct-mapped cache, you will add a second cache module to make your design two-way set-associative. Here are the four cases again:

6.1  Compare Read (comp = 1, write = 0)

The index and word need to be driven to both cache modules. There is a hit if either hit output goes high. Use one of the hit outputs as a select for a mux between the two data outputs. If there is a miss, decide which cache module to victimize based on this logic: If one is valid, select the other one. If neither is valid, select way zero. If both are valid, use the pseudo-random replacement algorithm specified below.

6.2  Compare Write (comp = 1, write = 1)

The index, word, and data need to be driven to both cache modules. There is a hit if either hit output goes high. Note that only one cache will get written as long as your design ensures that no line can be present in both cache modules.

6.3  Access Read (comp = 0, write = 0)

After deciding which cache module to victimize, use that select bit to mux the data, valid, and dirty bits from the two cache modules.

6.4  Access Write (comp = 0, write = 1)

Drive the index, word, data, and valid to both cache modules. Make sure only the correct module has its write input asserted.

7.  Pseudo-Random replacement for 2-way set-associative cache

In order to make the designs more deterministic and easier to grade, all set-associative caches must implement the following replacement algorithm:

  1. Have a flipflop called "victimway" which is intialized to zero.

  2. On each read or write of the cache, invert the state of victimway.

  3. When installing a line after a cache miss, install in an invalid block if possible. If both ways are invalid, install in way zero.

  4. If both ways are valid, and a block must be victimized, use victimway (after already being inverted for this access) to indicate which way to use.

  5. For the D cache, do not invert victimway for instructions that do not read or write cache, or for invalid instructions, or for instructions that are squashed due to branch misprediction.

  6. For the I cache, invert victimway for each instruction fetched.


Example, using two sets:


   start with victimway = 0
   load 0x1000    victimway=1; install 0x1000 in way 0 because both free
   load 0x1010    victimway=0; install 0x1010 in way 0 because both free
   load 0x1000    victimway=1; hit
   load 0x2012    victimway=0; install 0x2012 in way 1 because it's free
   load 0x2000    victimway=1; install 0x2000 in way 1 because it's free
   load 0x3000    victimway=0; install 0x3000 in way 0 (=victimway)
   load 0x3010    victimway=1; install 0x3010 in way 1 (=victimway)




















 
Computer Sciences | UW Home