Computer Sciences Department logo

CS 368-3 (2012 Summer) — Day 9 Homework

Due Thursday, July 19, at the start of class.

Goal

Practice writing regular expressions.

Tasks

You are not writing a script this time! Instead, I provide the script (similar to the one we used in class):

#!/usr/bin/perl
use strict;
use warnings;

open(INPUT, '<', $ARGV[0])
    or die "Could not open file: $!\n";
while (<INPUT>) {
    print if /cat/;   # PUT YOUR REGEXP IN PLACE OF /cat/
}
close INPUT;

Your assignment is to write regular expressions for the following patterns. Use the script to test your expressions. To help with testing, here are the two data files that I used in class:

For some of the patterns, you may wish to create your own input file(s). That is fine.

Most of the patterns below are narrowly defined, but some permit more freedom of interpretation. If you think that the description of a pattern is not clear, your answer must include a description of how you interpreted it; include example matches and non-matches to support your interpretation. If your pattern does not match my interpretation, or a fairly obvious other one, it will not count.

Patterns

You must get at least 11 of the 15 patterns below correct in order to get full credit for the assignment. Thus, if you do only 11 of the patterns and get one wrong, you will not get two points. Give yourself time, and be sure to check both matches and possible non-matches!

Think carefully about whether letter case matters in each pattern.

Some patterns are prefixed with “[words]” to indicate that the pattern should work on the words data file, and some are prefixed with “[Henry]” to indicate that the pattern should work on the King Henry V data file. Other patterns are for other data.

  1. [words] I am trying to solve a crossword puzzle clue; it contains exactly five letters, the first letter is “A” and the middle (3rd) letter is “P”. What are the possible words that could go there?
  2. [words] Find all single words, of any length, that contain exactly one vowel letter (a, e, i, o, u).
  3. [words] Find all palindromes that are exactly seven letters long.
  4. [words] Match the word “hello” and nothing else.
  5. [words] Find all words that begin with the letter “o” and end with the letter “n”; case does NOT matter, so your pattern should match words with mixed case.
  6. [words] Find all words that contain “time”.
  7. [words] Find all words that contain the same letter three times without other letters in between. For example, the word “headmistressship” has the letter “s” three times in a row.
  8. [words] Find all words that begin with either “cat” or “bat” and end in either “ion” or “ian”.
  9. [Henry] Find the first line of each part spoken by the Chorus. Look at the text carefully to figure out how to identify such lines.
  10. [Henry] Find all lines from the text that have four consecutive characters of punctuation in a row. You may choose to treat the underscore “_” as punctuation or not; state your decision.
  11. [Henry] Find every line where “France” is the last word of a sentence; sentences end with “.” in this text.
  12. [Henry] Find all lines containing between 12 and 20 words. To make the problem a little simpler, only worry about the period (.) and comma (,) and ignore all other punctuation. Although this is a more complicated pattern, the resulting regular expression should fit on one 80-character line.
  13. [Henry] I am looking for the famous line from this play, but all I can remember is that it contains the words “band” and “brothers”. Please help me find that line!
  14. Search through a Perl script (use one of your own) and print out all lines that are comments. If a line contains real code and then a comment, do not print it — just whole lines that are a comment and nothing else. Hint: Do not forget about indented lines.
  15. Match valid 10-digit North American Numbering Plan telephone numbers. [You knew this was coming, didn’t you?] Do the best you can, and clearly state your assumptions.

Testing

Be sure to test each regular expression using the script above and some real input. I can tell when you have not tested!

One common problem with regular expressions is to write one that does indeed match all strings you want it to match, but then also match many strings you do not want to. Watch out for the latter. Remember: A regular expression partitions all strings into matching and non-matching. Make sure you do not over-match.

Reminders

Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

A printout of your regular expressions, clearly labeled, on a single sheet of paper. Provide any necessary qualifications for your expressions, including example matches and non-matches. Be sure to put your own name in the initial comment block of the code. Identifying your work is important, or you may not receive appropriate credit.