Due Thursday, November 29, at the start of class.
Run more jobs! Specifically, run clusters with multiple jobs and handle files appropriately.
This assignment comes in three distinct but related parts, each with its own requirements.
Our core job for this assignment is to find prime numbers. Lots of them. So, we need a function to decide if a given number is prime, and then we need to call it for a wide range of numbers. We are not going to list the prime numbers we find, but rather simply keep a count of how many there are.
So, the first step is to write a function called is_prime
. It takes a single argument, expected to be
an integer, and returns True
if the number is prime, False
otherwise:
def is_prime(number): # Write this function
Now, there are lots of ways to check prime numbers, some of them quite fancy. I have no doubt that you can find Python code to do so online, which would, of course, violate class policy and generally subvert the goal here. You should try to write this function yourself, using a simple (if inefficient) method. It is good to practice your Python skills a bit, despite our focus on HTCondor right now. However, if you get really stuck, email me and I will send you some code.
How does one check for a prime number? Think about how you would do it yourself. Is the number 1,234,567,891 a prime number? (Yes, BTW.)
Here is a simple approach: For every integer divisor from 2 up to the square-root of the number (rounded up), does the number divide evenly by the divisor (i.e., has no remainder)? If so, then the number has a factor other than 1 and itself, and hence is not prime. It is easy to construct a short (< 10 line) function to implement this process.
Important note: Do not use
the range()
function in your assignment anywhere!!! I killed my laptop a couple times doing so. You
see, the range()
function creates a list in memory, and it is possible that, for the large numbers we
will be using, that you will cause your computer to run out of memory. On my laptop, at least, it did not do so
very gracefully. So, instead write your loop(s) using the xrange()
function. It works exactly the
same way as range()
in a for
loop, except that it never creates a list and hence consumes
a small and constant amount of memory, regardless of the size of the range.
Be sure that your function works on a few numbers before moving on to Part I. Some test cases:
Number | Prime? |
---|---|
0 | not prime |
1 | not prime |
2 | prime |
3 | prime |
4 | not prime |
5 | prime |
233 | prime |
235 | not prime |
123456789 | not prime |
1234567891 | prime |
Now that we have a function to test whether a number is prime, we can test lots of numbers!
For Part I, use your is_prime()
function in a larger script that counts the number of prime numbers
that occur between two integers. Make the script accept the lower and upper bounds from the command line. The
script should include the lower and upper bounds themselves in the numbers tested (e.g., 0 and 999 in the example
below). Each run should print out the range of numbers tested and the number of primes found; a typical run might
look like this:
./homework_10.py 0 999 168 primes between 0 and 999
Be completely sure that this script works as expected from the command line before moving on.
With this script, we can test lots of numbers in one run. Be a little cautious here! The script will cause one CPU core of your machine to run at full capacity until it is done. A little experimentation on my part suggests that we can test one million to a hundred million numbers per job, which yields run times of a few seconds to 20+ minutes (on current, fast hardware; YMMV).
OK, now the real assignment: Write a single HTCondor submit file that will run your script multiple times on consequetive ranges of integers. For example, if you want to run ten million numbers at a time, the ranges would be:
Let’s run 10 ranges of numbers total from your one submit file. How will you initiate each job? How will you handle arguments, input, output, etc.?
The net result of this submission should be (among other things) ten (standard) output files on disk, from which you could extract the prime-number counts (say, to admire, or graph, or something).
I hope you found it a bit tedious to set up the submit file in Part I. Imagine if you had 100 or 1000 jobs to run! We can do better.
In this part, you will change the job (script) and the submit file so that you produce exactly the same results,
using the same number of jobs, but with a single queue
statement in your submit file.
First, create an input text file. It will be very simple: Write one line per intended run, with that run’s start and end numbers, separated by whitespace. If you are doing blocks of ten million numbers at a time, the file could look like this:
0 9999999 10000000 19999999 20000000 29999999 30000000 39999999 40000000 49999999 50000000 59999999 60000000 69999999 70000000 79999999 80000000 89999999 90000000 99999999
Every job will read this same input file. Now, you just need to change the script. Copy the original script to a new name, and make the following changes:
0
for the first
line, 1
for the second, …
OK, now write a new submit file that works with the new script.
Run the new job cluster, and confirm that it produces the same output as in Part I.
Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.
All homework assignments must be turned in by email! See the email page for more details about formatting, etc.
For this assignment, you must submit several files in order for your assignment to be complete:
Do not include your output files this time!