Project 4: Cache Simulation

Important Dates

Questions about the project? Send them to 354-help@cs.wisc.edu .

Due: Wednesday 11/28 by whenever.

NOTE: HAVING A SINGLE PARTNER (i.e., a two-person team) IS OK FOR THIS PROJECT.

Overview

In this project we'll explore how hardware caches really work in making application workloads run faster. We'll do this by building a cache simulator -- in C, naturally -- and using it to explore the impact of different cache organizations on application performance.

Specifics

You should write a program called cachesim , which takes the following command-line arguments:

  • -t tracefile -- the name of the trace file with the address reference trace within it.
  • -S size -- this flag enables you to set the cache size, in KB.
  • -A associativity -- this enables you to set the associativity of the cache, e.g., 1 (direct mapped), 2, etc.
  • -B blocksize -- this flag enables you to change the block size of the cache, in bytes (also called line size). The minimum should be 4 bytes, and all blocks should be a multiple of 4 bytes.
  • -v -- verbose mode is turned on (described further below).

There are a number of options, if specified, that do not result in a valid cache. For example, block sizes that are not multiple of 4, a cache size which is not a multiple of block size and associativity, etc. You should figure out what these cases are; when an illegal set of parameters are specified, you should print out the error message Illegal configuration and exit the program with exit code 1.

To handle these options, you should likely use the getopt package; read about it online to learn more. Best thing is to start with a working example and go from there.

The cache should be word addressed; thus there will only be 4-byte aligned accesses in the trace (assume words are 4 bytes in size).

The cache should be a write-back cache using a write-allocate policy.

When the cache has an associativity of greater-than-one, it should use a least-recently-used (or LRU) replacement policy to decide which block in the set should be removed.

The input is found in a file that holds all of the address references you will be simulating. The format is simple: a series of lines with either 'R' or 'W' as the first character (indicating a read from the memory system or a write, respectively) followed by a space followed by a 64-bit address (in hex) of the data being read or written. For example, the file might contain:

R 0000000000008000
R 0000000000008004
R 000000000000800c
R 0000000000008010
which is a nice example of reading through an array starting at address 0x8000 and ending at 0x8010.

In normal (non-verbose) mode, the cache should print out some statistics at the end. Specifically, it should print out the following lines (in this exact format):

total accesses: 4
hits: 2
misses: 2
total reads: 3
read hits: 1
total writes: 1
write hits: 1

In verbose mode, the simulator should print out each input line, and then add the single word HIT (all caps) or MISS after the line to indicate whether the particular address was a hit or a miss. This feature can be useful for debugging. For example, the above trace might lead to:

R 0000000000008000 MISS
R 0000000000008004 HIT
R 000000000000800c HIT
R 0000000000008010 HIT
This MISS/HIT/HIT/HIT sequence might happen, for example, if the block size is 16 bytes (or more), and the first miss thus fetches the entire block and leads to cache hits for the subsequent spatially-nearby accesses.

There are two parts to the assignment. The first part (roughly 3/4 of the grade) is simply building a working cache simulator that passes the tests we hand out. The tests will stress basic functionality.

The second part (roughly 1/4 of the grade) is where you get to explore caching for an application of your choosing. Specifically, you need to use the tool pin (available here ) to get an address trace from the application (more on this soon, though of course you can always start on your own). You then need to use your cache simulator to produce what is known as a graph of the working set of the application. This graph plots miss rate on the y-axis, while varying the size of the cache on the x-axis. You should vary the cache size from something small (say around 8 KB) to something fairly large (say 8 MB), using x-axis values that are powers of two (e.g., 8 KB, 16 KB, 32 KB, etc.). You can also plot more than one line on the graph; each line should represent the results with a different associativity (e.g., 1, 2, 4, and 8). What does the graph tell you about the application? How big of a cache should it use? Does associativity matter?

Bonus: Do separate plots for the instruction stream and data stream, instead of unifying the two.

Handing It In

This project, unlike the others thus far, can be done with a single partner. Copying code (from other groups) is considered cheating. Read this for more info on what is OK and what is not.

The handin directory is ~cs354-3/handin/login/p4 where login is your login. Please turn the final code into BOTH you and your partner's directory.

Finally, please include a short report that describes which application you chose to study, includes a graph (or a few), and answers the questions about the working set of the application. IMPORTANT: include the name of both people working on the project.