Computer Sciences Department logo

CS 368-4 (2012 Fall) — Day 10 Homework

Due Thursday, November 29, at the start of class.

Goal

Run more jobs! Specifically, run clusters with multiple jobs and handle files appropriately.

Tasks

This assignment comes in three distinct but related parts, each with its own requirements.

Preparation: Write a Simple Prime-Number Finder

Our core job for this assignment is to find prime numbers. Lots of them. So, we need a function to decide if a given number is prime, and then we need to call it for a wide range of numbers. We are not going to list the prime numbers we find, but rather simply keep a count of how many there are.

So, the first step is to write a function called is_prime. It takes a single argument, expected to be an integer, and returns True if the number is prime, False otherwise:

def is_prime(number):
    # Write this function

Now, there are lots of ways to check prime numbers, some of them quite fancy. I have no doubt that you can find Python code to do so online, which would, of course, violate class policy and generally subvert the goal here. You should try to write this function yourself, using a simple (if inefficient) method. It is good to practice your Python skills a bit, despite our focus on HTCondor right now. However, if you get really stuck, email me and I will send you some code.

How does one check for a prime number? Think about how you would do it yourself. Is the number 1,234,567,891 a prime number? (Yes, BTW.)

Here is a simple approach: For every integer divisor from 2 up to the square-root of the number (rounded up), does the number divide evenly by the divisor (i.e., has no remainder)? If so, then the number has a factor other than 1 and itself, and hence is not prime. It is easy to construct a short (< 10 line) function to implement this process.

Important note: Do not use the range() function in your assignment anywhere!!! I killed my laptop a couple times doing so. You see, the range() function creates a list in memory, and it is possible that, for the large numbers we will be using, that you will cause your computer to run out of memory. On my laptop, at least, it did not do so very gracefully. So, instead write your loop(s) using the xrange() function. It works exactly the same way as range() in a for loop, except that it never creates a list and hence consumes a small and constant amount of memory, regardless of the size of the range.

Be sure that your function works on a few numbers before moving on to Part I. Some test cases:

NumberPrime?
0not prime
1not prime
2prime
3prime
4not prime
5prime
233prime
235not prime
123456789not prime
1234567891prime

Part I: One Submit File, Many Queue Statements

Now that we have a function to test whether a number is prime, we can test lots of numbers!

For Part I, use your is_prime() function in a larger script that counts the number of prime numbers that occur between two integers. Make the script accept the lower and upper bounds from the command line. The script should include the lower and upper bounds themselves in the numbers tested (e.g., 0 and 999 in the example below). Each run should print out the range of numbers tested and the number of primes found; a typical run might look like this:

./homework_10.py 0 999
168 primes between 0 and 999

Be completely sure that this script works as expected from the command line before moving on.

With this script, we can test lots of numbers in one run. Be a little cautious here! The script will cause one CPU core of your machine to run at full capacity until it is done. A little experimentation on my part suggests that we can test one million to a hundred million numbers per job, which yields run times of a few seconds to 20+ minutes (on current, fast hardware; YMMV).

OK, now the real assignment: Write a single HTCondor submit file that will run your script multiple times on consequetive ranges of integers. For example, if you want to run ten million numbers at a time, the ranges would be:

Let’s run 10 ranges of numbers total from your one submit file. How will you initiate each job? How will you handle arguments, input, output, etc.?

The net result of this submission should be (among other things) ten (standard) output files on disk, from which you could extract the prime-number counts (say, to admire, or graph, or something).

Part II: One Submit File, One Queue Statement, Many Jobs

I hope you found it a bit tedious to set up the submit file in Part I. Imagine if you had 100 or 1000 jobs to run! We can do better.

In this part, you will change the job (script) and the submit file so that you produce exactly the same results, using the same number of jobs, but with a single queue statement in your submit file.

First, create an input text file. It will be very simple: Write one line per intended run, with that run’s start and end numbers, separated by whitespace. If you are doing blocks of ten million numbers at a time, the file could look like this:

0 9999999
10000000 19999999
20000000 29999999
30000000 39999999
40000000 49999999
50000000 59999999
60000000 69999999
70000000 79999999
80000000 89999999
90000000 99999999

Every job will read this same input file. Now, you just need to change the script. Copy the original script to a new name, and make the following changes:

OK, now write a new submit file that works with the new script.

Run the new job cluster, and confirm that it produces the same output as in Part I.

Reminders

Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

All homework assignments must be turned in by email! See the email page for more details about formatting, etc.

For this assignment, you must submit several files in order for your assignment to be complete:

Do not include your output files this time!