Main »

Homework 5

Tasks

edit SideBar

CS 752: Advanced Computer Architecture I (Spring 2015 Section 1 of 1)

Homework 5

Due Monday, 3/15

You should do this assignment on your own. No late assignments.

Person of contact for this assignment: Jason Power <powerjg@cs.wisc.edu>

The goal of this assignment is two-fold. First, for you to experience creating a new SimObject in gem5, and second for you to consider tradeoffs in cache design.

An updated cache.py for configuration can be downloaded here http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/html/hw5/caches.py. You can replace the cache.py found in the previous homework here: http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/html/hw4-configs.tar.gz.

Step 1: Implement NMRU replacement policy

You can follow the tutorial here: http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/gem5-tutorial/index.html Part 2 of the tutorial will walk you through how to create the NMRU policy.

Step 2: Implement PLRU replacement policy

Follow similar steps as you did to implement NRU, but implement pseudo-LRU instead. Psuedo-LRU uses a binary tree to encode which blocks are less recently used than other blocks in the set. These slides from Mikko Lipasti do a good job explaining the PLRU algorithm: https://ece752.ece.wisc.edu/lect11-cache-replacement.pdf.

Step 3: Architectural exploration

This time, the Entil CEO has tasked you with designing the L1 data cache of their new processor based on the out-of-order O3CPU. For this task, the marketing director of Entil claims that most of their customers' workload is in the matrix multiply kernel. Due to it's memory intensity, Entil believe a better cache design could make their processor outperform the competition (AMM, Advanced Micro Machines if you're keeping track).

A blocked matrix multiply implementation can be downloaded here: http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/html/hw5/mm.cpp. Use an input of 128x128 matrix (./mm 128).

You can choose from three replacement policies for the L1D cache: Random, NMRU, PLRU. As the associativity increases, the costs for NMRU and PLRU rises, whereas the cost for Random stays the same. Therefore, Random can be used with higher associativities than the other replacement policies. Additionally, because NMRU and PLRU must update the recently used bits in the tag they access, these policies limit the clock rate of the CPU. Note, the max clock of the O3 CPU is 2.3 GHz in this generation.

The constraints for these policies are summarized below.

 RandomNMRUPLRU
Max assoc.1688
Lookup time100 ps500 ps666 ps

Clearly describe in a one page memo to the CEO of Entil, all of the configurations you simulated, the results of your simulations, and your overall conclusion of how to architect the L1 data cache. Additionally, answer the following specific questions:

  1. Why does the 16-way set-associative cache perform better/worse/similar to the 8-way set-associative cache?
  2. Why does Random/NMRU/PLRU/None perform better than the other replacement policies?
  3. Is the cache replacement/associativity important for this workload, or are you only getting benefits from clock cycle? Explain why the cache architecture is important/unimportant.

What to Hand In

Turn in your assignment by sending an email message to Jason Power <powerjg@cs.wisc.edu> and Prof. David Wood <david@cs.wisc.edu> with the subject line: [CS752 Homework4]

Please turn in your homework in the form of a PDF file.


Page last modified on March 10, 2015, at 11:20 AM, visited times