Main »

Homework 5

Tasks

edit SideBar

Homework 5

Due 04/24
Weight: 17%

On this page... (hide)

  1. 1. Problem 1
  2. 2. Problem 2
  3. 3. Problem 3
  4. 4. Problem 4
  5. 5. Problem 5
  6. 6. Problem 6 (zero points)

Important


1.  Problem 1

Note: you may work with your project partner on the first 2 problems ONLY. One electronic submission and handwritten submission is ok. Clearly indicate in your homework, if your partner's submission has these problem. Completely abandoning responsibility for these problem to your partner is disallowed. You must JOINTLY do both problems. One partner doing the direct-mapped and one partner doing the 2-way cache will severely hurt your productivity! First complete direct-mapped before moving to the set-associative cache.

You will implement a hierarchal memory system in Verilog that consists of a level-1 write-back cache and stalling memory. The system should use a direct-mapped cache and a four-banked, four-cycle memory. See the project modules provided page for the Cache module and four-banked memory module. Blocks are 4 words wide and the system is byte-addressable, word-aligned.

The top level module that you will develop is as follows. verilog template source for mem_system.v


module mem_system(/*AUTOARG*/
   // Outputs
   DataOut, Done, Stall, CacheHit, err,
   // Inputs
   Addr, DataIn, Rd, Wr, createdump, clk, rst
   );

   input [15:0] Addr;
   input [15:0] DataIn;
   input        Rd;
   input        Wr;
   input        createdump;
   input        clk;
   input        rst;

   output [15:0] DataOut;
   output Done;
   output Stall;
   output CacheHit;
   output err;

   /* data_mem = 1, inst_mem = 0 *
    * needed for cache parameter */
   parameter mem_type = 0;

   // your code here

endmodule // mem_system

A top-level module called mem_system_hier.v is also provided which instantiates the clock generator and mem_system inside it. verilog source for mem_system_hier.v

Two testbenches, with a reference memory module, and loadfiles to initialize your memory are provided in this tarball. mem_system.tgz

Important Notes:

  • This module will also handle internal errors.
  • The system should ignore new inputs when it is stalled (not an error). Timing will be an important part of this problem; Designs that stall longer than necessary will be docked points.
  • The Done signal should be asserted for exactly one cycle. If the request can be satisfied in the same cycle that data should be presented, Done should be asserted in that same cycle.
  • The CacheHit signal denotes whether the request was a hit in the cache.
  • The memtype parameter decides whether this is an instruction or data cache which is used to generate the names for the dump files.

To complete this problem, you will need to determine how the internal components are arranged and will have to create a cache controller FSM. See the description of the cache module for hints on how this should be done. You can chose to implement either a Mealy or Moore machine, although I recommend using a Moore machine as it will likely be easier to create. Be forewarned that the resulting state machine will be relatively large so get started early.

The testing for this module should be extensive. You will need to verify that the design works correctly during hits, misses, writebacks, and refills. Also be sure to check the design under various main memory stall conditions.

For extra credit, you can improve your performance by adding a two-entry store buffer so that it is possible for writes to complete in one cycle. The extra credit will not count if the standard system is not working, so be sure to thoroughly test your design before thinking about moving on.

For this design, as usual:

Instantiating the cache modules

This is the methodology I suggest for instantiating your cache modules, so the naming conventions are the same for everyone.

Somewhere in mem_system.v:

cache0 (0 + memtype) c0(....)

This will guarantee that when this module is finally connected to your processor, your instruction and data memory will create separate dump files. See the Cache module section for details.

Verification

Verification is an important part and significant challenge for this problem. You are provided with two testbenches:

  • mem_sytem_randbench.v: This is a completely randomized testbench that stresses the functional correctness of your design. You must run this testbench, which will print a log of addresses and statistics on the number of hits to the cache. Contact the TA for questions about the testbench. Your module must pass this testbench. The output log is saved in a file called transcript. Copy this file as randbench.log and include in your handin. Specifically this testbench does the following:
  1. full random: 1000 memory requests completely random
  2. small random: 1000 memory requests restricted to addresses 0 to 2046
  3. sequential addresss: 1000 memory requests restricted to address 0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40
  4. two sets address: 1000 memory requests to test a two-way set associative cache. You should get predominantly hits to to the cache.

After every set of 1000 requests, you will see a message like the following:

  1. LOG: Done two_sets_addr Requests: 4001, Cycles: 79688 Hits: 562

To run this testbench:


wsrun.pl mem_system_randbench *.v

  • mem_system_perfbench.v: This is a more carefully constructed testbench that is meant to check the performance of your design. For example, is your cache reporting cache hits and misses correctly. Are the requests being serviced in the correct amount of time etc. This testbench takes as input a memory address trace in a file called mem.addr. This file must be in the same location as your verilog files. An example mem.addr file is provided. The format of the file is the following:
    • Each line represents a new request
    • Each line has 4 numbers separated by a space
    • The numbers are: Wr Rd Addr Value

You must write different address traces to test your module and prove that it does implement the cache correct. Determining what to test and show is an important part of this problem. Carefully document and show in your homework, what cases you are testing. Pick representative inputs from this testbench, by examining the waveforms. You must handin annotated waveforms to prove that your design works correctly during hits, misses, writebacks, and refills.

To run this testbench:


wsrun.pl mem_system_perfbench *.v

What to submit:

  1. Turn in neatly and legibly drawn schematics of your design. Your design hierarchy should be clear in this schematic.
  2. A state diagram of your cache controller.
  3. If you used a case statement in your Verilog, turn in a state transition table. Otherwise turn in the derived state equations.
  4. Annotated simulation trace of the complete design. Pick representative cases for your simulation input to turn in.
  5. If you implemented the extra credit, please make a note.
  6. Electronically handin the following, all in one single tgz called hw5-p1.tgz
    1. For the mem_system_randbench testbench, the log output in a file called randbench.log.
    2. Electronically submit your verilog source code. Vcheck output must be included in the tgz.
    3. For the mem_system_perfbench testbench, all additional trace files your wrote explaining what they do
    4. Annotated traces that show your process implements all cache operations correctly. Pick representative inputs from this testbench, by examining the waveforms. You must handin annotated waveforms to prove that your design works correctly during hits, misses, writebacks, and refills.

2.  Problem 2

Implement your 2-way set associative cache which is required for the project. See the Cache module page. Replace your direct-mapped cache in the above problem with this 2-way set associate cache.

Instantiating the cache modules

This is the methodology I suggest for instantiating your cache modules, so the naming conventions are the same for everyone.

Somewhere in mem_system.v:

cache0 (0 + memtype) c0(....) cache1 (2 + memtype) c1(....)

This will guarantee that when this module is finally connected to your processor, your instruction and data memory will create separate dump files. See the Cache module section for details.

        Parameter Value     File Names
        ---------------     ----------
               0            Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ...
               1            Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ...
               2            Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ...
               3            Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...

What to submit

  1. Same as problem 1.
  2. Electronically submit all your contents in a file called hw5-p2.tgz.

3.  Problem 3

Consider a direct-mapped cache with 32-byte blocks and a total capacity of 512 bytes in a system with a 32-bit address space.

  1. Indicate which bits of an address in this machine correspond to the tag, index, and offset, respectively.
  2. For the sequence of addresses below, indicate which references will result in cache hits and which will result in cache misses. If it does result in a miss, mark whether the miss was a compulsory, capacity, or conflict miss. Assume the cache is initially empty. (All valid bits are set to 0)
  3. Show the final contents of the address tags at the end of execution.
  4. Explain what can be done to improve each type of miss.
0x0000a796  
0x000092e8 
0x000092f4
0x00004182  
0x0000780a  
0x0000a690  
0x0000408e  
0x0000a798  
0x00007800  
0x000092fc  
0x00027c02  
0x0000408a  
0x00004198  
0x00006710  
0x0000670c  
0x00027c04  
0x0000a790  

4.  Problem 4

Re-do problem 3, but using a two-way set-associative cache. When replacing a block, the least-recently-used block is chosen to be replaced. Everything else (block size and total capacity) remains the same.

Determine the speedup over the direct-mapped cache in problem 3. Assume both caches can be accessed in 1 cycle, that the CPI without misses is 1.0, and that the miss penalty is 25 cycles.


5.  Problem 5

Consider a cache with the following characteristics:

  • 32-byte blocks
  • 5-way set associative
  • 1024 sets
  • 47-bit addresses
  • writeback
  • LRU replacement policy
  1. How many bytes of data storage are there?
  2. What is the total number of bits needed to implement the cache?
  3. Make a picture similar to the one on page 503 of the text. (As with the picture in the text, include the hit and data logic.)

6.  Problem 6 (zero points)

Sun's OpenSPARC chip design is available as opensource verilog. For this problem you will browse through this design to get a sense for what real industry designs look like. The source code is available here:

http://opensparc-t1.sunsource.net/nonav/source/verilog/html/verilog.html

Click on the Hierarchy for cmp_top and navigate through the hierarchy down to the processor core.

cmp_top (top level)
OpenSPARCT1 (chip main)
sparc (processor core, 8 instances)
sparc_exu, sparc_ifu, spu etc. (individual modules inside processor core)

I specifically recommend taking a look at sparc_ifu and the sparc_exu units. You will find similarities to your project design. Look for how the pipelining has been implemented. Also notice, the clean separation between modules, the well defined interfaces, and the hierarchy. Also notice the separation between control path and data path.

You do not need to turn in anything for this problem. This is an exercise to familiarize you with industrial strength designs.


Page last modified on April 29, 2008

Edit - History - Print - Recent Changes (All) - Search