Due Thursday, April 26, at the start of class.
Run a simulation program many times under a few different conditions, then analyze the raw data using R.
At heart, this assignment uses the same queue simulator from homework #8. But this time, we want to run different experimental conditions and then run a bunch of statistics on the raw data, all by running a single command! That is, we will create a simple workflow to handle all of the work.
This is information about the script that I wrote. You do not need to write this program!
For background on the queue simulator itself, see homework #8. The script for this assignment is a little bit different:
Here is the new usage information (note the new script name):
Usage: queue_sim_loop [options] SERVERS CLIENTS-PER-HOUR COUNT Options: --version show program's version number and exit -h, --help show this help message and exit -d, --departures Allow clients to leave queue after waiting too long
Also, I wrote a simple (simplistic?) R script that analyzes the raw data. To learn more about R, visit the R Project website.
There are two main tasks you need to complete:
In this part, you will create and test four separate submit files that correspond to the four nodes of our workflow. Because each submit file is similar to ones you have written before, I am providing minimal guidance here.
To get started, follow these steps:
mkdir homework-11 cd homework-11Name the directory whatever you like!
tar xzf homework-11.tar.gzYou will have two files:
queue_sim_loop: The new simulation program
qsl-analyze.r: The R analysis script
Now, write and test Condor submit files for each step of the process. A few details to note:
submit-368, except it will not work there, because R is not installed on our submit machine), you would use this command:
qsl-analyze.r is the program to run and it takes no arguments. Also, see below about
input filenames. How do you tell Condor about the input files? Note that the R script writes to standard
output and it creates a separate PDF file with a data plot.
The numbers in each filename correspond to the experimental condition. They are formatted as
S is the number of servers,
CC is the number of clients
per hour, and
TTTT is the number of trials. Either make your simulation jobs create output files
with those names, or else change the R script to expect the filenames you choose.
.logfile, but suit yourself.
Make sure that you can successfully run all four jobs before moving on to the next part!
OK, now you have four working submit files. It is time to link them together into a single workflow. Obviously, we will use DAGMan to do this part.
First, draw a picture of the overall workflow. Pencil and paper is OK, no need to hand it in. But make sure you understand what each node does, what its inputs and outputs are, and how the nodes are connected. If you are unsure about this step, write to me!
Then, write the DAGMan submit file itself. It is not very complicated, and if you have done things right up to this point, you will not need to modify your Condor submit files from Part I at all.
And now, the moment we have been waiting for… Submit the entire workflow, stand back, and wait for the results!
Some ideas for extra learning:
VARSstatement. Now, merge your three simulation submit files into one, changing it and the DAGMan submit file to work together.
Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.
A printout of:
If you can squeeze that onto a single sheet of paper (double-sided is great!), the planet and I will thank you. Be sure to put your own name at the top of each piece of paper, regardless; identifying your work is important, or you may not receive appropriate credit.