UW-Madison
Computer Sciences Dept.

CS 757 Computer Architecture II Spring 2011 Section 1
Instructor David A. Wood
URL: http://www.cs.wisc.edu/~david/courses/cs757/Spring2011/

Homework 3 // Due at Lecture Fri Mar 11

You should do this assignment alone. No late assignments.


Simulation is one of the most important research and development methods in computer architecture. In this assignment, you will gain hands-on experience using an execution-driven full-system multiprocessor simulator based on Wisconsin GEMS built on top of Virtutech Simics. The goal of this assignment is to give you a chance to feel how simulation works, how to use a typical simulator and to get a first-hand feeling about simulation capabilities.

Simulator Setup

  • Download GEMS v2.1 from GEMS website.
  • Download a tarball of Simics from here (UW IPs only).
  • Download a tarball of checkpoints from here (UW IPs only).
  • Follow the instructions for setting up GEMS
Make sure your simulator is up and running by Mar 8.

Workload Setup


Problem 1 (15 points)

Simulate the 'eg_pthread' workload provided with this assignment. Use the 16-processor 'silver' machine checkpoints. Build Ruby with the CMP directory protocol 'MESI_CMP_filter_directory' and configure it dynamically using the parameters provided in 'eg_pthread.simics'. Plot the speedup of the workload with varying number of threads t = [1, 2, 4, 8, 16] relative to the t=1 case. Use Ruby_Cycles in the statistics file output by the simulator in order to calculate the speedup.

Problem 2 (15 points)

Modify your pthread Ocean program from homework 2 in order to simulate it with Ruby/Simics. Specifically, you need to instrument your Ocean program to break simulation before and after the parallel phase of simulation. You will only simulate and time the parallel phase of Ocean using Ruby. Clarification: in homework 2 many of you made the mistake of including creation of threads into the timing. Please be sure to not make the same mistake here. The parallel phase of the simulation does not include the creation of threads, it only includes the time it takes the threads to process their part of the ocean. Please be sure to only simulate that part. If you do not understand what this means, please bug your TA. Since simulation takes a long time, you will simulate a smaller sized ocean for this problem. Run your program on a 258x258 ocean for 50 iterations. Again, plot the speedup of the workload with varying number of threads t = [1, 2, 4, 8, 16] normalized to the t=1 case. Use Ruby_Cycles in the statistics file output by the simulator in order to calculate the speedup.


What to handin

  • Plots for problem 1 and problem 2. Do your speedup numbers for ocean obtained with the simulator match the speedup numbers obtained in homework 2? Why or why not? Does the speedup trend observed with the simulator match the speedup trend from homework 2? Why or why not?
  • What is the difference between trace-driven simulation and execution-driven simulation? What are the advantages and disadvantages of trace-driven simulation? What are the advantages of execution-driven simulation? What kind of simulation did you do when you simulated eg_phtreads and ocean?
  • The 20 lines preceeding "Ruby_cycles:" within each Ruby stats file.

Tips and Tricks

  • Start EARLY
  • If you have executed the steps in the two tutorials provided, you should have done most of Problem 1.
  • See $GEMS/ruby/config/rubyconfig.defaults for default parameters that configure the memory system.
  • Bug the TA early, if necessary. Do not wait till the morning the assignment is due.
  • Simulation can be done on clover-02.cs.wisc.edu. Workloads should be compiled on chianti.cs.wisc.edu
  • Use ONLY the machine indicated above. The other machines in the 'clover', 's-', and 'ale-' clusters are research machines allocated for batch scheduled jobs.

Important: Include your name on EVERY page.

 
Computer Sciences | UW Home