Computer Sciences Department logo

CS 368-3 (2012 Summer) — Day 10 Homework

Due Monday, July 23, at the start of class.


Parse (with regular expressions) a file of test results, build a data structure to hold the data, and produce a nicely formatted report.


Every night, we run a series of automated tests of our software distribution. Each run produces a text file of results with a great deal of detailed output. Most of the time, though, we just want to see a summary of the results, not all of the details. Perl to the rescue!

A bit of background: The tests are written in Python and are divided up into modules. Roughly speaking, there is one module per functional unit of testing. For each module, there are one to many individual tests, and each test produces a single line of output indicating whether the test succeeded, failed, or died because of a Python coding error. In our summary, we want to see the results tallied by module, not by the individual test. Thus, you must write a script to extract test result data and to tally the data in a useful data structure. Once that is done, writing out a summary report is trivial.

Input Files

Below are links to two separate input files. Your script will read and analyze only one file per run. I am giving you two input files just for variety! Your script must work with each file and any others like it. If you really want more data to play with, you can simply go to our test results page and download any of the linked test result files (do not take the ones labeled “Setup Failed”).

Input File Format

Your script must parse parts of the input file, filtering, extracting, and changing data until they are in the right form. Use regular expressions to do as many of these tasks as possible! Even if there are other ways to do it, it is best to keep practicing with regular expressions — plus, I think they are the most appropriate tool for this assignment.

The input files contain many different kinds of output lines. But we need only one kind of input line; your script should find these lines and ignore the rest. They are in a block together toward the start of each file. Here are a few sample lines (from input file #1):

test_17_squid (osgtest.tests.test_35_osg_configure.TestOSGConfigure) ... ok
test_18_storage (osgtest.tests.test_35_osg_configure.TestOSGConfigure) ... ok
test_19_utilities (osgtest.tests.test_35_osg_configure.TestOSGConfigure) ... FAIL

The basic format is as follows:


Note: There may be extra text between the three dots (...) and the RESULT, but the RESULT will always be the last text of the line. In some cases, that extra text may have newlines in it, in which case you may simply ignore the whole test — or better yet, figure out how to process test results that span multiple lines…

Your script must extract the RESULT and parts of the MODULE-NAME (described next).

If we strip away all but the MODULE-NAME from the lines above, we get:


You must extract the highlighted parts. The first is the sequence number of the test module, and the last part is the module name itself. Note: Not all module names have a sequence number; your script should ignore the lines that do not have one.

Notice that most (but not all) module names start with the word “Test”. If the module name starts that way, strip off the leading word “Test” to make it easier to read. See the sample report in the next subsection for examples of the trimmed module names.

So, now you have the module sequence number, the (trimmed) module name, and the result of the test (“ok”, “ERROR”, or “FAIL”). For each module, tally the number of ok tests, the error tests, and the failed tests. You must also keep the sequence number, because it will be used to order the lines of the report.

Think hard about your data structure here. Look at the report format, below, and figure what data you need to keep and how to organize it. Use a single variable to hold the entire data structure — that is, do not have parallel data structures.

Report Format

For each module, print one line with the module name and the tallies of ok, error, and failed tests. Be sure to print the modules in order by sequence number. Line up the columns so that the report is an easy-to-read table. Here is an example of my output:

## Test              OK  Err Fail
-- --------------- ---- ---- ----
 5 StartMySQL         1    0    0
10 StartCondor        1    0    0
11 StartCondorCron    1    0    0
12 StartGatekeeper    3    0    0
13 StartGridFTP       1    0    0


How will you test your script? Can you write some unit tests to try out the trickiest parts? Here are some general ideas for testing this script:


Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

A printout of your code, ideally on a single sheet of paper. Be sure to put your own name in the initial comment block. Identifying your work is important, or you may not receive appropriate credit.