Computer Sciences Department logo

CS 368-4 (2011 Fall) — Day 7 Scripts

The standalone scripts from class today are copied below. Mostly, the code comes from the slides, with a few extra bits.

It is good to have sizeable input files to work with when practicing regular expressions. Here are the two mentioned in class:

Testing a Match at the Start of a String (Slide 23)

import re

string = raw_input('Enter string: ')
regexp = raw_input('Enter regexp: ')
if re.match(regexp, string):
    print 'Match!'

Testing a Match Anywhere in a String (Slide 24)

import re

regexp = raw_input('Enter regexp: ')
wordfile = open('input-07-words.txt')
for line in wordfile:
    if re.search(regexp, line):
        print line.strip()

For Practice

If you like, try writing and testing regular expressions for the following patterns.

Some patterns are prefixed with “[words]” to indicate that the pattern should work on the words data file, and some are prefixed with “[Henry]” to indicate that the pattern should work on the King Henry V data file. Other patterns are for other data.

  1. [words] I am trying to solve a crossword puzzle clue; it contains exactly five letters, the first letter is “A” and the middle (3rd) letter is “P”. What are the possible words that could go there?
  2. [words] Find all single words, of any length, that contain exactly one vowel letter (a, e, i, o, u).
  3. [words] Match the word “hello” and nothing else.
  4. [words] Find all words that begin with the letter “o” and end with the letter “n”; case does NOT matter.
  5. [words] Find all words that contain “time”.
  6. [Henry] Find the first line of each part spoken by the Chorus. Look at the text carefully to figure out how to identify such lines.
  7. [Henry] Find every line where “France” is the last word of a sentence; sentences end with “.” in this text.
  8. [Henry] I am looking for the famous line from this play, but all I can remember is that it contains the words “band” and “brothers”. Please help me find that line!
  9. Search through a Python script (use one of your own) and print out all lines that are comments. If a line contains real code and then a comment, do not print it — just whole lines that are a comment and nothing else. Hint: Do not forget about indented lines.
  10. Match valid 10-digit North American Numbering Plan telephone numbers. What pattern(s) are you trying to find? (There are many, pick one or a few.)