Computer Sciences Dept.

CS/ECE 757 Advanced Computer Architecture II Spring 2012 Section 1
Instructor Mark D. Hill and Assistant Jayneel Gandhi
URL: http://www.cs.wisc.edu/~markhill/cs757/Spring2012/

Homework 3 // Due at Lecture Wed Mar 7

You should do this assignment alone. No late assignments.


Simulation is one of the most important research and development methods in computer architecture. In this assignment, you will gain hands-on experience using an execution-driven full-system multiprocessor simulator based on gem5 The goal of this assignment is to give you a chance to feel how simulation works, how to use a typical simulator and to get a first-hand feeling about simulation capabilities.

You have to perform the setup on 64-bit linux machine from ale cluster (ale-01 and ale-02).

Simulator Setup

  • Check if your machine is a 64-bit machine. Use the command uname -a. It should show either X86_64 or ia64.
  • Add the following lines to your ~/.bashrc.local file to use the C/C++ and SWIG versions provided.
    export CXX="/usr/bin/g++44"
    export PATH=/s/swig-1.3.31/bin:$PATH
  • Create a mercurial setup file (~/.hgrc) in your home folder with the following template. This will help you in case you deciede on downloading the current version of gem5 from the website for your project. Do not use the current version from the gem5 webpage for the homework. There are certain patches that I have put in the private release for the homework.
    Please fill in the underlined items.
    [ui]
    # Set the username you will commit code with
    username= First Last <cs login@cs.wisc.edu>
    ssh = ssh -C

    # Always use git diffs since they contain permission changes and rename info
    [defaults]
    qrefresh = --git
    email = --git
    diff = --git

    [extensions]
    # These are various extensions we find useful

    # Mercurial Queues -- allows managing of changes as a series of patches
    hgext.mq =

    # PatchBomb -- send a series of changesets as e-mailed patches
    hgext.patchbomb =

    # External Diff tool (e.g. kdiff3, meld, vimdiff, etc)
    hgext.extdiff =

    # Fetch allows for a pull/update operation to be done with one command and automatically commits a merge changeset
    hgext.fetch =

    # Path to the style file for the M5 repository
    # This file enforces our coding style requirements
    style = Full path to your gem5 installation/util/style.py

    [email]
    method = smtp
    from = First Last <cs login@cs.wisc.edu>

    [smtp]
    host = sabe.cs.wisc.edu
  • Download gem5 from a private release. Copy the CS757.tar.gz file from /afs/cs.wisc.edu/p/course/cs757-markhill/public/hw3 into your folder.
  • Untar the file into your folder.
    The folder has 4 main sub-folders
    a. binaries: containing binaries of linux for Full-System mode
    b. disks: These have the disk image that would be visible to the shell once linux is booted on the simulator
    c. gem5: This is the folder that contains gem5 and all the files related to it
    d. programs: this folder would contain your user programs (like eg_pthread)
  • Go to the gem5 folder and compile gem5 for x86_SE mode (syscall emulation). This command will take a while to finish.
    scons -j 8 build/X86_SE/m5.fast PROTOCOL=MESI_CMP_directory
  • Once you have built the X86_SE mode for gem5, it is the time to run some compiled program on it. Go to the programs folder. SE mode requires special light-weight pthread library called m5threads to run on gem5. We will statically compile the pthread.c in this folder.
    cd ../programs/m5threads
    gcc -c pthread.c --static
  • You will copy this object file(pthread.o) into the eg_pthread folder to link it to the eg_pthread program.
    cd ../eg_pthread
    cp ../m5thread/pthread.o .
  • We have introduced new instructions into the eg_pthread.c. These are defined in gem5/util/m5/m5op.h. Please look at eg_pthread.c file.
    a. m5_reset_stats: resets the stats of the system
    b. m5_work_begin: This specifies where a work unit starts
    c. m5_work_end: This specifies where the work unit ends
    These instructions help us in specifiying where to start recording the stats from and what a is work unit in case you want to reduce the number of work units to execute on gem5 without changing the program.
  • You will compile the new eg_pthred.c file with the following command.
    gcc -c eg_pthread.c -I../../gem5/util/m5 --static
  • You need to link all the programs (pthread.o+eg_pthread.o) with the gem5 library m5op_x86.S. This library contains the implementation of the new functions that we have introduced in the program.
    gcc -o eg_pthread eg_pthread.o pthread.o ../../gem5/util/m5/m5op_x86.S -I../../gem5/util/m5 --static
  • Now you are set to run the program with the SE mode that you started compling long time back.
    cd ../../gem5
    build/X86_SE/m5.fast configs/example/se.py -h
    This will provide all the options available to the SE mode script.
    Let us run the program with two threads. Remember in syscall mode you have to allocate the processors for each and every thread context. Do notice the number of CPUs and the the option being passed to the program.
    build/X86_SE/m5.fast -d hw3/ configs/example/se.py --cmd=../programs/eg_pthread/eg_pthread --options="2" --cpu-type=timing --num-cpus=3 --ruby --work-end-exit-count=5000
    Once the simulation finishes to run. You will be able to see the stats in the hw3 folder (see -d in the above command). You can try running the program for N=[1,2,4,8,16]
  • After learning about Syscall Emulation (SE) mode of gem5, we will see how to work with Full System (FS) mode. Lets compile gem5 for FS mode.
    scons -j 8 build/X86_FS/m5.fast PROTOCOL=MESI_CMP_directory
  • Once you have the binary built for FS mode, you will run a pre-complied version of eg_pthread already provided to you. Since FS mode is booting a real linux, you don't need a light weight pthread library. You compile as usual and load the compiled program on the disk image. Since, disk image creation requires “root” access, you have been provided a precomplied version of the program in the disks folder. You will create a soft link to the disk image.
    cd ../disks
    ln -s cs757.img x86root.img
    The script searches x86root.img disk image after booting linux to load the disk image.
  • You need to set path to point to the folder containing the disks.
    export M5_PATH=<path the the folder containing the folder containing disks folder>
  • We are ready to boot linux and try out it FS mode works. We have created a bash script that will automatically run the program on the disk and produce result. You can remove that part of the command and try to boot linux and run it yourself as well. To see the terminal on which linux boots, see m5term section on gem5 webpage. You can run the program from the terminal as well.
    build/X86_FS/m5.fast -d hw3-fs/ configs/example/ruby_fs.py --num-cpus=2 --kernel x86_64-vmlinux-2.6.22.9.smp --cpu-type=timing --work-end-exit-count=5000 --script=configs/boot/eg_pthread.rcS
Make sure your simulator is up and running by Mar 6.


Problem 1 (10 points)

Simulate the 'eg_pthread' workload provided with this assignment with SE mode. Plot the speedup of the workload with varying number of threads t = [1, 2, 4, 8, 16] relative to the t=1 case. Use Ruby_Cycles in the statistics file output by the simulator in order to calculate the speedup. Use work count to be 9999, so that the program finishes before all the work finishes and you do not count overheads as well.

Problem 2 (30 points)

Modify your pthread Ocean program from homework 2 in order to simulate it with gem5 in SE mode. Specifically, you need to instrument your Ocean program to record stats of the parallel phase of simulation. You will only simulate and time the parallel phase of Ocean using Ruby. See how eg_pthread was instrumented. Note: The parallel phase of the simulation does not include the creation of threads, it only includes the time it takes the threads to process their part of the ocean. Please be sure to only record stats that part. Since simulation takes a long time, you will simulate a smaller sized ocean for this problem. Run your program on a 258x258 ocean for 50 iterations (or smaller with lesser iterations). Do not let run run for more than an hour or two for 16 threads. Again, plot the speedup of the workload with varying number of threads t = [1, 2, 4, 8, 16] normalized to the t=1 case. Use Ruby_Cycles in the statistics file output by the simulator in order to calculate the speedup.

Problem 3 (10 points)

Simulate the 'eg_pthread' workload provided to you in the disk image with FS mode. Plot the speedup of the workload with varying number of threads t = [1, 2, 4, 8, 16] relative to the t=1 case. Use Ruby_Cycles in the statistics file output by the simulator in order to calculate the speedup. Use work count to be 9999, so that the program finishes before all the work finishes and you do not count overheads as well.

What to handin

  • Plots for problem 1, problem 2 and problem 3. Do your speedup numbers for ocean obtained with the simulator match the speedup numbers obtained in homework 2? Why or why not? Does the speedup trend observed with the simulator match the speedup trend from homework 2? Why or why not?
  • What is the difference between trace-driven simulation and execution-driven simulation? What are the advantages and disadvantages of trace-driven simulation? What are the advantages of execution-driven simulation? What kind of simulation did you do when you simulated eg_phtreads and ocean?
  • Did you observe any difference in speedups between problem 1 and problem 3? Why or Why not?

Tips and Tricks

  • Start EARLY
  • Use only 64-bit linux machines from from ale cluster (ale-01 and ale-02).

Important: Include your name on EVERY page.

 
Computer Sciences | UW Home