gem5 CHTC Tutorial
This webpage contains detailed instructions about how to run gem5 in CHTC. The goal of this webpage is to provide you with sufficient information on how to run a application in gem5 on CHTC. To ease this process, we have created a dockerfile for you that contains everything you should need to run the stable branch of gem5. In Homework 4 you will be responsible for extending these instructions to run multiple CPU configurations and multiple applications. This tutorial was created by Kyle Roarty and Matt Sinclair, with help from Christina Koch and Lauren Michael, and is based on a tutorial from CS 758 created by Brian Bockelmann and Matt Sinclair. While we believe it's bug-free, if you spot something amiss, please let us know ASAP!
Step 1: Logging in.
Before doing anything else, first you will need to log into the learn.chtc.wisc.edu submit server. CHTC has many submit servers you could run on, but this submit server is specifically designed for running coursework experiments like this assignment. To log into this server, your username is from your NetID (for example, if your NetID is "bbadger@wisc.edu", then your account name is "bbadger"). The password is the same as your NetID. Given this, you can log into the learn.chtc.wisc.edu submit server using:
ssh learn.chtc.wisc.edu -l bbadger
Note that the CHTC host will only allow logins from on-campus; if you don't have an existing way to SSH onto campus, you can utilize the WiscVPN service.
Step 2: Compiling gem5.
A: Now that we've logged in, the next thing we need to do is compile gem5. To do this, we're going to take advantage of CHTC (lots of parallelism, easy to run with many threads -- which building gem5 loves) to compile gem5 as well. First, to submit a job that builds gem5, we need to create a submission script. Submission scripts are what CHTC revolve around. Essentially, you should think of submission scripts as the way to tell CHTC what you want your job to do, as well as specify where the output of the run should go, where an errors should go, how many resources you need, etc. Note that CHTC is a highly advanced system, and there are many more options than we'll be using in this tutorial. An example gem5 build submission file is:
# gem5-build.sh # gem5 build submission file universe = docker docker_image = gcr.io/gem5-test/ubuntu-20.04_all-dependencies log = gcr.log error = gcr.err output = gcr.out executable = build-gem5.sh environment = "TERM=xterm-256color" should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = gem5/ request_cpus = 8 request_memory = 12GB request_disk = 12GB stream_output = True queue 1
There are several important things to note from this file (see the CHTC User Guide for more details):
- universe: since we are using one of gem5's preset dockers for our experiments, we want the universe to be docker.
- docker_image: this specifies where gem5's docker (for Ubuntu 20.04) is located.
- log (and error/output): these specify what file the log information, errors, and output should be redirected too. You may name these whatever you want. Technically, they are also optional, but it is highly useful to have this information handy. In my opinion, output will be the most important file -- that's where it tells you what's going on with your job run.
-
executable: this is telling the submission script what you want executed by the job (in this case, our shell script that is going to build gem5). I recommend making this shell script executable, e.g.,:
chmod u+x build-gem5.sh
- file transfers: The next three commands specify if you want files transferred (which we do, since we want the built gem5 to be sent back to us and we need to send the system the gem5 files to build with), when we want the output transferred (when the job is done, since that's when gem5 is built), and what input files should be used (here, the gem5 root directory so we can build the CPUs inside gem5).
- requests: This is where we specify how many CPUs (threads) we want the job to be run with (8 here) and how much memory/disk the job needs (12 GB here).
- queue: Crucially, at the end of the file we need to include this line, so that CHTC knows to enqueue the job we're running. The "1" here represents that we're queueing just one job -- we could also choose to queue many jobs from the same line with different arguments, but that's an advanced trick we won't cover here.
B: Now that we've created our submission script for compiling gem5, now we need to create the shell script that actually tells CHTC what to do for your job. In that script, we essentially want to tell what gem5 stuff we want built. For example you may do:
#!/bin/bash # gem5-build.sh # gem5 build shell script scons build/X86/gem5.opt -j$(nproc) CPU_MODELS=AtomicSimpleCPU,TimingSimpleCPU,O3CPU,MinorCPU tar -czf build.tar build/
Note that as part of this script we included a tar command to tar the build directory. This is important because in CHTC if you don't, the build directory won't be sent back to the submit server (i.e., where you submitted the job from). This will also applies to m5out when running a job, which we'll discuss in Step 4. But, because of the need to tar build, you will also have to untar the build directory once the job completes (e.g., untar it back into the gem5 folder). Additionally, note that the above docker just contains the required software to build gem5. You will still need to clone gem5 into the gem5 folder before launching any jobs.
C: Finally, now that both scripts are created, we can launch our job (e.g., condor_submit build-gem5.sub if build-gem5.sub is the name of your submission script). What this will do is tell the CHTC system that you have a new job you'd like run and you're launching it from your submit server. Underneath the hood, CHTC will look across the available machines and find one that is (a) able to support your job and (b) currently has at least some spare cycles to use on your job. One of the cool things about CHTC is it actually may move your job as it's executing across multiple machines -- without your needing to be involved! Also we recommend making this shell script executable (e.g., chmod u+x build-gem5.sh).
D: After launching the job, you'll need to wait a little bit for CHTC to do its magic and compile the job. Upon lauching, you should see CHTC print some information to your console with the ID of the job you launched. To check and see if that job is still running (or held), you can use condor_q. Eventually, you should see that your job completed though, and is no longer on the list of running condor jobs (i.e., when you run condor_q, your gem5 job is no longer there. Once this is done, you can progress to step 3!
Sidenote: this process here is what CHTC refers to as a "regular job." Another option is to submit an "interactive job" where you get dedicated access to a machine and run the necessary commands to do whatever you want to do interactively. Since that is not necssary for gem5, and because there are many fewer interactive machines in the CHTC cluster, we use a regular job instead.
Step 3: Compiling DAXPY
A & B: Now that we've compiled gem5, the next thing we need to do is compile the application we want to run in gem5. Just like when we compiled gem5, we'll create a submission file and shell script for compiling DAXPY.
# build-daxpy.sub # build DAXPY submission script universe = docker docker_image = gcr.io/gem5-test/ubuntu-20.04_all-dependencies log = daxpy.log error = daxpy.err output = daxpy.out executable = build-daxpy.sh should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = daxpy.cpp environment = "TERM=xterm-256color" request_cpus = 1 request_memory = 1GB request_disk = 1GB stream_output = True queue 1
---
#!/bin/bash # build-daxpy.sh # build DAXPY shell script g++ -std=c++14 daxpy.cpp -o daxpy
Note that our submission script looks pretty similar to the one for building gem5. Except here we're pointing to the daxpy files we want to compile and the daxpy shell script. Similarly, the shell script for building DAXPY simply specifies how to compile DAXPY with g++.
C & D: Once you've created the above scripts, submit your "build DAXPY" job just like you did the gem5 build job, wait for it to complete (should be much faster than building gem5), and proceed to Step 4!
Step 4: Running DAXPY in gem5 on CHTC
Important: Please see FAQ for how to avoid clobbering your m5out directory -- by default CHTC assumes multiple jobs in parallel write to the same output folder.
A & B: Now that we've compiled both DAXPY and gem5, we can finally run DAXPY in gem5 on CHTC!
Just like the last two steps, we need to create a submission and shell script for running DAXPY in gem5 on CHTC:
# run-daxpy.sub # running DAXPY in gem5 submission file universe = docker docker_image = gcr.io/gem5-test/ubuntu-20.04_all-dependencies executable = run-daxpy.sh # if we want DAXPY to be size 1K arguments = 1024 should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = gem5/, daxpy environment = "TERM=xterm-256color" request_cpus = 1 request_memory = 8GB request_disk = 12GB stream_output = True queue 1
---
#!/bin/bash # run-daxpy.sh # running DAXPY in gem5 shell script mkdir m5out build/X86/gem5.opt -r configs/example/se.py --cpu-type=MinorCPU --ruby -c daxpy --options="$1" tar -czf m5out.tar m5out
Most importantly, note that for command line arguments (e.g., the size of DAXPY) we utilize the "options" field, and tell it to take the first argument passed in the command line -- so when we launch this job, we should also specify the size of DAXPY. We also updated the submission file accordingly to specify these arguments (e.g., above we specify 1024 as the input size for DAXPY with arguments. As in Step 2, note that here we need to tar the output (which means on the submit server you'll need to untar it after the job is done). Similarly, since we're only running a single-threaded job, we only need 1 CPU for this test (unlike in Step 2). Moreover, you'll need to update the config file, cpu-type, and the application as appropriate for the gem5 config you want to use, the job you want to run, and the CPU you want to test on, respectively.
C & D: Now that we've made our scripts, as in Steps 2-3, we need to submit the CHTC job, and wait for it to return. Crucially, note that you can launch all of the experiments you want to run in parallel!
Step 5: Parsing the output
Once your experiments are done, you should scp the m5outs from the submit server to some local machine (e.g., the CSL machines) and parse the outputs there. The two main reasons are: (a) the submit servers are mostly intended for launching jobs, not for post-processing data and (b) since CHTC is its own separate file system, it's useful/important to copy it elsewhere so you can reference it later.
FAQ
- How do I get HTCondor to email me after my jobs complete/fail: See here.
- What if my job has in the "Held" state?: Usually this means there was an error of some kind with what you are running. Try checking the err output (perhaps using some of the tricks below in the FAQ). In my case, I usually check the log output first -- e.g., to see if a file couldn't be found by HTCondor.
-
How do I avoid overwriting m5out for different runs from the same directory?: Since you may be running multiple configurations from the same application folder/directory, if you are not careful you will overwrite your previous runs m5out folder with your next runs m5 out folder! Obviously, this would be less than desireable. So, there are a few ways to workaround this:
-
(simplest) Name the m5out directory uniquely per job you are launching. One option here is to append a timestamp for when the job is launched to the m5out directory. The traditional way to do this is to pass an argument from your submit file to your shell file that uniquely identifies the m5out directory:
# submission file... arguments = 1024 $(ClusterId) ...
# shell filemkdir m5out.$2
...
tar m5out.$2.tar m5out.$2
So, if the job ID is 182530.0, then the resulting m5out directory would be m5out.182530/. Likewise, I also suggest you update the log, output, and error lines in your submit file to be unique per job, e.g.,: output = out.log.$(ClusterId), to make it easy to check what happened in your runs. - Use different initialdirs in the submit file. This essentially requires that you have a different directory for each run you want to do, but does a good job of isolating things. It also means you should add the "initialdir" field to your submission scripts to specify where the job should be run from. If you go this route, keep in mind that transfer_input_files uses initialdir as the base directory for transfer_input_files, and not the directory you're in when running condor_submit.
- (most complex, least amount of sub/shell files) See the CHTC User Guide, especially Chapter 4, for additional options that include batching multiple jobs together in a single sub/shell script combo -- this would also allow you to set the appropriate initialdir/outputdir based on the arguments you use to express this information.
-
(simplest) Name the m5out directory uniquely per job you are launching. One option here is to append a timestamp for when the job is launched to the m5out directory. The traditional way to do this is to pass an argument from your submit file to your shell file that uniquely identifies the m5out directory:
-
Is there a way to enable CHTC to show the partial output when my job is running?: Yes! There are three options:
- (Preferred in most cases - should work everywhere on CHTC) Run "condor_ssh_to_job $JOBID". This will start a SSH terminal inside the job's environment. You can not only look at the files, but poke at the executable if you want.
- (Works everywhere, even when condor_ssh_to_job doesn't) Run "condor_tail $JOBID _condor_stdout". Acts a bit like the Unix 'tail' utility.
- (Must be specified at submit-time) Set "stream_output = True" in the submit file.
- Can I use more than 8 cores for more building gem5 job?: Yes, you can. However, the more CPUs you request, the longer your wait time will likely be. This is because the are substantially fewer 16 core machines in the cluster than there are 8 core machines. The CHTC suspects the wait time for a 16 core machine will be longer than it will take to compile gem5 in a 8 core machine, but your mileage may vary.