Raw data: Supporting x86-64 Address Translation for 100s of GPU Lanes

This page is documentation for the paper
Supporting x86-64 Address Translation for 100s of GPU Lanes
Jason Power, Mark D. Hill, David A. Wood
The 20th IEEE International Symposium On High Performance Computer Architecture, HPCA 20. Feb 2014

Contact information: Jason Power (powerjg@cs.wisc.edu).

Below is details on obtaining the raw data and code used for the evaluation in the paper. The purpose of this page is to provide transparency and reproducability for this particular work. If you are interested in gem5-gpu in general, or want to expand on this work we encorage you to instead use the publicly available and supported version at http://gem5-gpu.cs.wisc.edu.

The public version of gem5-gpu has most of the same code as below. The major changes between what's available here and the public code are:

Support for the code and data on this page will be provided minimally, with focus on supporting the public code base.

This page is broken into two main parts. The first, "Raw Data," explains how to access and understand the raw data used in the evaluation. The second section, "Raw Code," explains how to access and run the code used to generate the raw data.

We would like to thank the developers of gem5-gpu, principally, Joel Hestness and Marc Orr, for their effort on that project, which we leverage. Additionally, some of the code used for this paper outside of gem5-gpu (scripts, etc.) was developed in part by Joel Hestness.

Raw Data

This section contains the information about how to obtain and understand the raw data used for this paper. The data consists of the output from the gem5-gpu simulator and supporting scripts for parsing, filtering, and displaying the data.

How to obtain

A tarball of all of the data is available at this address:http://gem5-gpu.cs.wisc.edu/gpummu-hpca14/files/raw_data.tar.bz2.

The above tarball is 216 MB compressed and about 4 GB uncompressed.

How to interpret

The tarball contains a set of directories which each contain the data for all applications for a particular configuration of the simulator. The raw statistics can be found in the leaf directories under the name "stats.txt." This file is generated by gem5-gpu automatically and contains all of the statistics that were registered in the code as gem5 stats. These statistics include all of the gem5 statistics, gem5-gpu statistics, and additional statistics added for this paper. The tarball also contains a script (getData.py) used to filter and process the raw stats output, and some images that were used in the paper. Below we describe the file structure of this data, briefly describe the scripts included, and finally describe how graphs were generated for the paper.

File structure

Each top-level directory contains the output of one particular configuration of the simulator. The directories are named based on the configuration with the following general format: <L1 TLB size>-<Shared L2 TLB size>-<page walk cache size>. Additionally, "-p" refers to TLB prefetching enabled, and "mt<num>" refers to the depth of the multithreaded page table walker. Finally, the directory "baseline" is the baseline we compared to, described in the paper.

Each top-level directory has the following layout:

Scripts

The file getData.py is a python script that is used to parse the stats.txt file for each of the configurations. It is not meant to be used in a standalone fashion. Instead, it should be used by other python scripts to extract the data and other scripts can generate graphs or other output. Below in the "Graphs" section we describe how we generated the graphs for this paper.

getData.py contains classes to wrap the gem5 stats types (like Hist), and a class for the general m5 stats file. This main class (Stats) is used to parse the statistics files and each instance of a Stats class can be queried to obtain (almost) any stat in the stats file.

Additionally, getData.py contains a Benchmark class which is used to query the statistics for some particular benchmark (like backprop, in fs mode, size simsmall). This class also has many helper functions such as "getPKIStat", which automatically divides the stat by the total number of instructions executed (get data per kilo-instruction).

There are also helper functions included in getData.py meant to be used to find the largest run of a benchmark that completed and gather its stats, as well as printing and plotting stats.

Finally, getData.py contains a set of helper functions to take raw stats from the Benchmark objects and generate interesting statistics, like TLB miss rate, memory accesss per kilo-cycle, etc.

Graphs

In this project to generate our graphs we used the python package Matplotlib. We also used the package IPython notebook to organize the graphs and data. Below I have a set of links to the notebook files that were used. To use the notebook files yourself, you will need to install and configure Matplotlib and IPython notebook. Additionally, you will need to be running an IPython notebook server on your computer. Finally, the notebooks below have been sanitized by removing my particular paths, so you will need to adjust any paths if you want to regenerate any graphs from the data in the above tarball.

To view the notebook you can follow this link: http://nbviewer.ipython.org/url/pages.cs.wisc.edu/~powerjg/notebooks/GPUMMU-data-hpca-2014.ipynb?create=1
Or download the notebook from http://pages.cs.wisc.edu/~powerjg/notebooks/GPUMMU-data-hpca-2014.ipynb/GPUMMU-data-hpca-2014.ipynb

Raw Code

In this section, we describe how to reproduce the experiments that were run to generated the data from above. We describe how to obtain, compile, and run the code. Additionally, we describe how to reproduce the configurations used in this work.

How to obtain

Code

The code is contained in a set of mercurial repositories. Specifically, it is a set of patch queues on top of a particular version of gem5-gpu. For information about how to use mercurial and patch queues, please see the mercurial documentation.

Below are the steps required to check out the code:

  1. Check out gem5-gpu. See http://gem5-gpu.cs.wisc.edu/
  2. Update to the following changesets:
    • gem5: 57aac1719f86
    • gem5-gpu: 353fd0030d60
    • gpgpu-sim: 65e93a2eddf9
    Note: Below describes a simpler mechanism in "How to configure and compile" after you have checked out all of the patch repositories.
  3. Check out project-specific patch queues
    • cd gem5
    • hg qq -c personal
    • cd gem5/.hg
    • hg clone http://gem5-gpu.cs.wisc.edu/gpummu-hpca14/repo/gem5-patches-gpummu-hpca14 patches-personal
    • cd ../../gem5-gpu/.hg/
    • hg clone http://gem5-gpu.cs.wisc.edu/gpummu-hpca14/repo/gem5-gpu-patches-gpummu-hpca14 patches
    • cd ../gpgpu-sim/
    • hg qq -c personal
    • cd .hg
    • hg clone http://gem5-gpu.cs.wisc.edu/gpummu-hpca14/repo/gpgpu-sim-patches-gpummu-hpca14 patches-personal

Workloads

The workloads we used came from the rodinia benchmark suite and are included in the gem5-gpu repositories. Update the workloads to the following changesets:

Extra files

Since we used gem5 full-system mode, other files including the disk image and kernel image are required. Note: These are large files (5 GB and 56 MB, respectively). The images are available below:

Note: The benchmark image has the binaries of the workloads described above.

Additionally, for this work we used a set of scripts to run gem5-gpu. You can obtain those by checking out the mercurial repository as follows.

How to configure and compile

There is a script included which automatically updates all of the repositories and patch queues to a revision given a .rev file. This script can be found at regression/revisions.py. This file can be used to both retrieve and restore a set of revisions across all of the gem5-gpu repositories. We use this script to set up and save the configurations used to generate our data.

To update to a particular revision using the revisions script:

To save the current revisions: To update to the main revisions used for this project, download the revision file from: gpummu-hpca14.rev and apply the restore to that revision.

For each configuration that we tested, there is a file run.rev saved to the output directory which contains the revision information. This file can be restored using the above command to recreate the environment that configuration used.

How to run

Compiling

These instructions are the same as for gem5-gpu.

Running

We used the scripts found in the regression/ directory for running gem5-gpu. This is not a requirement. To run all of the benchmarks with the test inputs:

Running the workloads this way will generate output directories that mirror those found in the data in the first part of this document.

There is some documentation on the regression script in the code and in the help context (--help), but it is an unsupported script.