Computer Sciences Department logo

CS 368-4 (2011 Fall) — Day 6 Homework

Due Tuesday, November 15, at the start of class.

Goal

Write Python code to distribute a set of files to multiple directories, possibly modifying the contents of the files for each directory. This goal is one step toward a real system that one might actually want for running a set of CHTC jobs.

Use lots of standard Python modules and functions to make your life easier and to write a better script. The trick is to identify the right modules and functions. All of the useful ones (for this assignment) were mentioned in the slides and class. So, look through the slides and try to find the modules and functions that are most relevant. Do not reinvent the wheel!

Tasks

The task is a bit complicated to describe. But essentially, you start with one directory — the source directory — which contains a set of files. Then, there are one or more pre-existing target directories. At its simplest, your code must read each file from the source directory and write it into each of the target directories.

For example:

Before

homework_06/
    source/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_a_dir/
    target_b_dir/
    target_c_dir/

After

homework_06/
    source/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_a_dir/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_b_dir/
        job.sub
        input_1.txt
        input_2.txt
        other.txt
    target_c_dir/
        job.sub
        input_1.txt
        input_2.txt
        other.txt

Test Files

I have created a set of source files and target directories for your script to run on. Instead of downloading each part separately, I have put everything into a single compressed archive file. Download the linked file to the place where you intend to run your Python script, and then run the following command in the same directory as the downloaded file:

tar xzf homework-06-starter.tar.gz

This will create an initial homework_06 directory as shown above. If your script makes changes to the homework_06 directory, and you want to return back to the initial directory contents, you can delete the entire homework_06 directory and restore the initial one with the following commands (type the first one carefully, because it can remove lots of files!):

rm -r homework_06
tar xzf homework-06-starter.tar.gz

More Details

The order in which directories and files are processed matters: Process the target directories in (string) sorted order, and then write all files into one directory before moving on to the next. This order is necessary because of one of the requirements further below.

Here are some assumptions that make the solution easier:

Complication #1: When to Write a File

Realistically, we may run our script more than once on the same source and target directories. In that case, we want to overwrite existing files only when the corresponding source file is newer. Think of it this way: We run the script once, then realize that we made a mistake in a source file, so we change one source file and rerun the script. In that case, only the updated source file is rewritten to the target directories.

So, to be more precise, only write a source file into a target directory:

It would be nice — but is not required — for your script to print one line for each combination of a target directory and source file, indicating whether the target file was written or skipped.

Complication #2: Modifying File Contents

The files are not merely written into the target directories. Instead, they may be modified along the way, if the source file contains certain special strings. Thus, your script must read each source file into memory (one string or a list of strings), and try to make the following changes, then write the (possibly modified) file contents from memory to the target file.

Look at the files in the sample source directory for examples of how these replacements are used.

Extra Challenge

You have learned how to create a separate module and use it from another script. So, move parts of your solution into a separate module file and have your main script use those parts via the module. Some questions to consider:

No one said that this is a great use of a separate module, but it is good practice nonetheless.

Reminders

Start your script the right way! Here is a suggestion:

#!/usr/bin/env python

"""Homework for CS 368-4 (2011 Fall)
Assigned on Day 06, 2011-11-10
Written by <Your Name>
"""

Do the work yourself, consulting reasonable reference materials as needed; any reference material that gives you a complete or nearly complete solution to this problem or a similar one is not OK to use. Asking the instructors for help is OK, asking other students for help is not.

Hand In

A printout of your code on a single sheet of paper (if possible, it may not be in this case). Be sure to put your own name in the initial comment block of the code. Identifying your work is important, or you may not receive appropriate credit.