|  
      
  
 
Homework 3 // Due at Lecture Fri Mar 11
You should do this assignment alone.  No late assignments.
 
 
Simulation is one of the most important research and development methods
in computer architecture. In this assignment, you will gain hands-on experience
using an execution-driven full-system multiprocessor simulator based on 
Wisconsin GEMS built on top of
Virtutech Simics.
The goal of this assignment is to give you a
chance to feel how simulation works, how to use a typical simulator and
to get a first-hand feeling about simulation capabilities.
 
  Simulator Setup 
-  Download GEMS v2.1 from 
GEMS website. 
 - 
Download a tarball of Simics from
here (UW IPs only). 
 - 
Download a tarball of checkpoints from
here (UW IPs only). 
 - 
Follow the instructions for setting up GEMS
 
 
Make sure your simulator is up and running by Mar 8. 
 Workload Setup 
 
 Problem 1 (15 points)
Simulate the 'eg_pthread' workload provided with this assignment. 
Use the 16-processor 'silver' machine checkpoints. Build Ruby with the CMP directory
protocol 'MESI_CMP_filter_directory' and configure it dynamically using the parameters
provided in 'eg_pthread.simics'. 
Plot the speedup of the workload with varying number of threads t = [1, 2, 4, 8, 16]
relative to the t=1 case.
Use Ruby_Cycles in the statistics file output by the simulator in order to calculate 
the speedup.
 Problem 2 (15 points)
Modify your pthread Ocean program from homework 2 in order to simulate it with Ruby/Simics.
Specifically, you need to instrument your Ocean program to break simulation before and after
the parallel phase of simulation. You will only simulate and time the parallel phase of Ocean
using Ruby. 
Clarification: in homework 2 many of you made the mistake of including creation of threads into 
the timing. Please be sure to not make the same mistake here. The parallel phase of the simulation
does not include the creation of threads, it only includes the time it takes the threads to 
process their part of the ocean. Please be sure to only simulate that part. If you do not understand what
this means, please bug your TA.
Since simulation takes a long time, you will simulate a smaller sized ocean for this problem.
Run your program on a 258x258 ocean for 50 iterations. Again, plot the speedup of the 
workload with varying number of threads t = [1, 2, 4, 8, 16] normalized to the t=1 case.
Use Ruby_Cycles in the statistics file output by the simulator in order to calculate 
the speedup.
 
 
 What to handin
 
  -  Plots for problem 1 and problem 2. Do your speedup numbers for ocean obtained with the simulator match 
  the speedup numbers obtained in homework 2? Why or why not? Does the speedup trend observed with the simulator
  match the speedup trend from homework 2? Why or why not?
  
 -  What is the difference between trace-driven simulation and execution-driven simulation?
  What are the advantages and disadvantages of trace-driven simulation? What are the advantages of execution-driven simulation?
  What kind of simulation did you do when you simulated eg_phtreads and ocean?
 -  The 20 lines preceeding "Ruby_cycles:"  within each Ruby stats file.
  
  
 Tips and Tricks
- Start EARLY 
 -  If you have executed the steps in the two tutorials provided, you should have done most of Problem 1. 
 -  See $GEMS/ruby/config/rubyconfig.defaults for default parameters that configure the memory system.
 -  Bug the TA early, if necessary. Do not wait till the morning the assignment is due.
 -  Simulation can be done on clover-02.cs.wisc.edu. Workloads should be compiled on chianti.cs.wisc.edu
 -  Use ONLY the machine indicated above. The other machines in the 'clover', 's-', and 'ale-' clusters are research machines allocated for batch scheduled jobs.
  
Important: Include your name on EVERY page.
 
 
  
   |