Main »

Homework 5

Tasks

edit SideBar

Homework 5

Due 04/18
Weight: 15%

Use this cover-sheet as the first page of your homework. Download the word doc, fill your name and print. Or in hand write the details in big big letters. word doc, [pdf

1.  Important

  • Read Verilog rules check in the Tools page. Your program must pass Vcheck.
  • Review The elements of Logic Design Style
  • Homework is due at start of class
  • Problems 1 and 2 MUST be done with your project partner and must be submitted to the group dropbox.
  • Problem 1 cache controller state diagram is due in class one week before (04/09) Tuesday.
  • Problems 3, 4, 5, 6, 7, 8, 9 and 10 MUST be done ALONE.
  • You must abide by the Verilog file naming conventions

2.  Grading scheme (total- 100)

  • Problem 1 carries 35 points.
    • Points for verilog part of problem 1- 25
    • Points for written part of problem 1- 10
  • Problem 2 carries 15 points.
    • Points for verilog part of problem 2- 10
    • Points for written part of problem 2- 5
  • Problems 3-10 (total- 50):
    • 5(random) out of the 10 problems will be graded
    • Points for each graded problem- 10

State machine for problem 1 alone is due 04/09. One submission for each partner from project team (don't ask me in class if one per team is ok!). Keep a copy of these items you submit.


3.  Problem 1

Note: you may work with your project partner on the first 2 problems ONLY. Both partners must electronically submit and both must turn in handwritten submissions. Completely abandoning responsibility for these problem to your partner is disallowed. You must JOINTLY do both problems. One partner doing the direct-mapped and one partner doing the 2-way cache will severely hurt your productivity!

First complete direct-mapped before moving to the set-associative cache.

You will implement a hierarchical memory system in Verilog that consists of a level-1 write-back cache with write-allocate policy and stalling memory. The system should use a direct-mapped cache and a four-banked, four-cycle memory. See the project modules provided page for the Cache module and four-banked memory module. Blocks are 4 words wide and the system is byte-addressable, word-aligned.

The top level module that you will develop is as follows. verilog template source for mem_system.v


module mem_system(/*AUTOARG*/
   // Outputs
   DataOut, Done, Stall, CacheHit, err, 
   // Inputs
   Addr, DataIn, Rd, Wr, createdump, clk, rst
   );

   input [15:0] Addr;
   input [15:0] DataIn;
   input        Rd;
   input        Wr;
   input        createdump;
   input        clk;
   input        rst;

   output [15:0] DataOut;
   output Done;
   output Stall;
   output CacheHit;
   output err;

   /* data_mem = 1, inst_mem = 0 *
    * needed for cache parameter */
   parameter mem_type = 0;

   // your code here

   // You must pass the mem_type parameter 
   // and createdump inputs to the 
   // cache modules

endmodule // mem_system

A top-level module called mem_system_hier.v is also provided which instantiates the clock generator and mem_system inside it. verilog source for mem_system_hier.v

Two testbenches, with a reference memory module, and loadfiles to initialize your memory are provided in this tarball. mem_system.tgz (updated April 7, 11:00pm). See also Homework modules provided page under HW5 testbench.

Important Notes:

  • This module will also handle internal errors.
  • The system should ignore new inputs when it is stalled (not an error). Timing will be an important part of this problem; Designs that stall longer than necessary will be docked points.
  • The Done signal should be asserted for exactly one cycle. If the request can be satisfied in the same cycle that data should be presented, Done should be asserted in that same cycle.
  • The CacheHit signal denotes whether the request was a hit in the cache.
  • The memtype parameter decides whether this is an instruction or data cache which is used to generate the names for the dump files.

To complete this problem, you will need to determine how the internal components are arranged and will have to create a cache controller FSM. See the description of the cache module for hints on how this should be done. You can chose to implement either a Mealy or Moore machine, although I recommend using a Moore machine as it will likely be easier to create. Be forewarned that the resulting state machine will be relatively large so get started early.

The testing for this module should be extensive. You will need to verify that the design works correctly during hits, misses, writebacks, and refills. Also be sure to check the design under various main memory stall conditions.

For extra credit, you can improve your performance by adding a two-entry store buffer so that it is possible for writes to complete in one cycle. The extra credit will not count if the standard system is not working, so be sure to thoroughly test your design before thinking about moving on.

For this design, as usual:

  • Follow the Verilog file naming conventions for this design.
  • You must verify your design. See notes below on verification.
  • You must also synthesize your design

Instantiating the cache modules

This is the methodology I suggest for instantiating your cache modules, so the naming conventions are the same for everyone.

Somewhere in mem_system.v:

cache0 (0 + memtype) c0(....)

This will guarantee that when this module is finally connected to your processor, your instruction and data memory will create separate dump files. See the Cache module section for details.

Verification

Verification is an important part and significant challenge for this problem. You are provided with two testbenches:

  • mem_system_perfbench.v: This is a more carefully constructed testbench that is meant to check the performance of your design. For example, is your cache reporting cache hits and misses correctly. Are the requests being serviced in the correct amount of time etc. This testbench takes as input a memory address trace in a file called mem.addr. This file must be in the same location as your verilog files. An example mem.addr file is provided. The format of the file is the following:
    • Each line represents a new request
    • Each line has 4 numbers separated by a space
    • The numbers are: Wr Rd Addr Value

You must write different address traces to test your module and prove that it does implement the cache correct. Determining what to test and show is an important part of this problem. Carefully document and show in your homework, what cases you are testing. Pick representative inputs from this testbench, by examining the waveforms. You must handin annotated waveforms to prove that your design works correctly during hits, misses, writebacks, and refills.

To run this testbench:


wsrun.pl -addr mem.addr mem_system_perfbench *.v

Replace mem.addr with whatever filename you used for your address trace file.

  • At least 5 traces are required else you will get ZERO for this entire problem.
  • Do not start randomized testing until you have proven to yourself that your cache shows some basic functionality.
  • mem_sytem_randbench.v: This is a completely randomized testbench that stresses the functional correctness of your design. You must run this testbench, which will print a log of addresses and statistics on the number of hits to the cache. Contact the TA for questions about the testbench. Your module must pass this testbench. The output log is saved in a file called transcript. Copy this file as randbench.log and include in your handin. Specifically this testbench does the following:
  1. full random: 1000 memory requests completely random
  2. small random: 1000 memory requests restricted to addresses 0 to 2046
  3. sequential addresss: 1000 memory requests restricted to address 0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40
  4. two sets address: 1000 memory requests to test a two-way set associative cache. You should get predominantly hits to to the cache.

After every set of 1000 requests, you will see a message like the following:

  1. LOG: Done two_sets_addr Requests: 4001, Cycles: 79688 Hits: 562

To run this testbench:


wsrun.pl mem_system_randbench *.v

Pay careful attention to the sequential addresss and the two sets address traces. You can look at the testbench to see what sequence of address this is. You should be able to estimate how many hits your cache should report on them.

Synthesis

As with other homeworks you must synthesize your design. You must turn in the entire synth directory and make sure your total combinational area is non-zero.

What to submit:

  1. A state diagram of your cache controller.
    • Photocopy of above alone is required by 04/09.
  2. Turn in neatly and legibly drawn schematics of your design. Your design hierarchy should be clear in this schematic.
  3. If you used a case statement in your Verilog, turn in a state transition table. Otherwise turn in the derived state equations.
  4. Explanation of each mem_system_perfbench trace file your wrote and what it tests. At least 5 traces are required else you will get ZERO for this entire problem.
  5. Fill in the table for the 8 regression address traces. See below.
  6. On the handwritten homework you turn in, fill in the following synthesis info:
    1. Total area:
    2. Worst case slack:
  7. If you implemented the extra credit, please make a note.
  8. Handin Instruction

HW5 Problem1 address trace table


4.  Problem 2

Implement your 2-way set associative cache which is required for the project. See the Cache module page. Replace your direct-mapped cache in the above problem with this 2-way set associate cache.

Instantiating the cache modules

This is the methodology I suggest for instantiating your cache modules, so the naming conventions are the same for everyone.

Somewhere in mem_system.v:

cache0 (0 + memtype) c0(....) cache1 (2 + memtype) c1(....)

This will guarantee that when this module is finally connected to your processor, your instruction and data memory will create separate dump files. See the Cache module section for details.

        Parameter Value     File Names
        ---------------     ----------
               0            Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ...
               1            Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ...
               2            Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ...
               3            Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...

Pay careful attention to the sequential addresss and the two sets address traces from mem_system_randbench. You can look at the testbench to see what sequence of address this is. You should be able to estimate how many hits your cache should report on them.

What to submit

  1. Same as problem 1.
  2. Submit all your contents in a directory called hw5_2
  3. Fill in value for the 6 address traces for the 2-way set associative cache
  4. For mem_2way7.addr print the text output of simulation with all the LOG messages. These message should denote the hit/miss for each request. You should explain in handwritten notes the reason for each each hit/miss

HW5 Problem2 address trace table


5.  Problem 3

Given a 2K Bytes 2 way set associative cache with 16 byte lines and the following code:

for (int i =0; i < 1000; i++)

 {
  A[i] = 40 * B[i];
 }

a) Compute the overall miss rate (assuming array entries require one word, and each word is 4 byte, and that the base address of each array is aligned with cache line boundary).

b) What kind of cache locality is being exploited?


6.  Problem 4

Consider a direct-mapped cache with 32-byte blocks and a total capacity of 512 bytes in a system with a 32-bit address space. Assume this is a byte addressable cache.

  1. Indicate which bits of an address in this machine correspond to the tag, index, and offset, respectively.
  2. For the sequence of addresses below, indicate which references will result in cache hits and which will result in cache misses. If it does result in a miss, mark whether the miss was a compulsory, capacity, or conflict miss. Assume the cache is initially empty. (All valid bits are set to 0)
  3. Show the final contents of the address tags at the end of execution.
  4. Explain what can be done to improve each type of miss.
0x0000a796  
0x000092e8 
0x000092f4
0x00004182  
0x0000780a  
0x0000a690  
0x0000408e  
0x0000a798  
0x00007800  
0x000092fc  
0x00027c02  
0x0000408a  
0x00004198  
0x00006710  
0x0000670c  
0x00027c04  
0x0000a790  

7.  Problem 5

Re-do problem 4, but using a two-way set-associative cache. When replacing a block, the least-recently-used block is chosen to be replaced. Everything else (block size and total capacity) remains the same.

Determine the speedup over the direct-mapped cache in problem 4. Assume both caches can be accessed in 1 cycle, that the CPI without misses is 1.0, and that the miss penalty is 25 cycles.


8.  Problem 6

Consider a cache with the following characteristics:

  • 32-byte blocks
  • 5-way set associative
  • 1024 sets
  • 47-bit addresses
  • writeback
  • LRU replacement policy
  1. How many bytes of data storage are there?
  2. What is the total number of bits needed to implement the cache?
  3. Make a picture similar to the one on page 486 of the text. (As with the picture in the text, include the hit and data logic.)

9.  Problem 7

How many storage bits are required to implement a 256KB cache, with 16B blocks, that is a 4 way set-associative, uses write-back policy, LRU replacement and assuming a 2^36 byte addressable address space ?

Bits are required for : 1. The Data 2. The Tags 3. The Valid bits 4. The dirty bits 5. The LRU bits


10.  Problem 8

Do problems 5.4.1 to 5.4.3 in page 551 of textbook.


11.  Problem 9

Do problems 5.7.1 to 5.7.3 in page 554 of textbook.


12.  Problem 10

Given processor running at 2GHz with a base CPI of 1.0 (CPI without considering memory access delay, stalls, etc). About 30% of the instructions in a program involve data memory access. The access delay of instruction memory is ignored. The data memory access time is 100 ns including miss handling. Its primary (L1) cache has a hit rate of 99% and no access penalty if it is a hit. Now, it is considered to add a L2 cache between the L1 cache and the main memory. Suppose the L2 cache has a miss ratio of 20% and access delay of 5 ns. How much performance improvement with the L2 cache than without it?



Page last modified on April 08, 2013, visited 2057 times

Edit - History - Print - Recent Changes (All) - Search