Computer Sciences Department logo

CS 368-4 (2012 Fall) — Day 6 Homework

Due Monday, November 12, at the start of class.

Goal

Write Python code to distribute a set of files to multiple directories, possibly modifying the contents of the files for each directory. This goal is one step toward a real system that one might actually want for running a set of CHTC jobs.

Use lots of standard Python modules and functions to make your life easier and to write a better script. The trick is to identify the right modules and functions. All of the useful ones (for this assignment) were mentioned in the slides and class. So, look through the slides and try to find the modules and functions that are most relevant. Do not reinvent the wheel!

Tasks

The task is a bit complicated to describe. But essentially, you start with one directory — the source directory — which contains a set of files. Then, there are one or more pre-existing target directories. At its simplest, your code must read each file from the source directory and write it into each of the target directories.

For example:

Before

homework_06/
    source/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_a_dir/
    target_b_dir/
    target_c_dir/

After

homework_06/
    source/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_a_dir/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_b_dir/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_c_dir/
        job.sub
        input_1.txt
        input_2.txt
        other.txt

Test Files

I have created a set of source files and target directories for your script to run on. Instead of downloading each part separately, I have put everything into a single compressed archive file. Download the linked file to the place where you intend to run your Python script, and then run the following command in the same directory as the downloaded file:

tar xzf homework-06-starter.tar.gz

This will create an initial homework_06 directory as shown above. If your script makes changes to the homework_06 directory, and you want to return back to the initial directory contents, you can delete the entire homework_06 directory and restore the initial one with the following commands (type the first one carefully, because it can remove lots of files!):

rm -r homework_06
tar xzf homework-06-starter.tar.gz

More Details

The order in which directories and files are processed matters: Process the target directories in (string) sorted order, and then write all files into one directory before moving on to the next. This order is necessary because of one of the requirements further below.

Here are some assumptions that make the solution easier:

Complication #1: When to Write a File

Realistically, we may run our script more than once on the same source and target directories. In that case, we want to overwrite existing files only when the corresponding source file is newer. Think of it this way: We run the script once, then realize that we made a mistake in a source file, so we change one source file and rerun the script. In that case, only the updated source file is rewritten to the target directories.

So, to be more precise, only write a source file into a target directory:

It would be nice — but is not required — for your script to print one line for each combination of a target directory and source file, indicating whether the target file was written or skipped.

Complication #2: Modifying File Contents

The files are not merely written into the target directories. Instead, they may be modified along the way, if the source file contains certain special strings. Thus, your script must read each source file into memory (one string or a list of strings), and try to make the following changes, then write the (possibly modified) file contents from memory to the target file.

Look at the files in the sample source directory for examples of how these replacements are used.

Testing

Now that you know about modules and functions, you *could* write Python code to test parts of your Python code. I have written a brief document to get you started, if you like. This is definitely a more advanced approach, but I am happy to answer questions about it.

Here are some specific tests to consider, for manual or automated testing:

Extra Challenge

You have learned how to create a separate module and use it from another script. So, move parts of your solution into a separate module file and have your main script use those parts via the module. Some questions to consider:

No one said that this is a great use of a separate module, but it is good practice nonetheless.

Reminders

Start your script the right way! Here is a suggestion:

#!/usr/bin/env python

"""Homework for CS 368-4 (2012 Fall)
Assigned on Day 06, 2012-11-08
Written by <Your Name>
"""

Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

All homework assignments must be turned in by email! Attach your Python script to the email as a text .py file. See the email page for more details about formatting, etc.