CS 368-4 (2011 Fall) — Day 7 Scripts
The standalone scripts from class today are copied below. Mostly, the code comes from the slides, with a few extra
bits.
It is good to have sizeable input files to work with when practicing regular
expressions. Here are the two mentioned in class:
Testing a Match at the Start of a String (Slide 23)
import re
string = raw_input('Enter string: ')
regexp = raw_input('Enter regexp: ')
if re.match(regexp, string):
print 'Match!'
Testing a Match Anywhere in a String (Slide 24)
import re
regexp = raw_input('Enter regexp: ')
wordfile = open('input-07-words.txt')
for line in wordfile:
if re.search(regexp, line):
print line.strip()
For Practice
If you like, try writing and testing regular expressions for the following patterns.
Some patterns are prefixed with “[words]” to indicate that the pattern should work on the words data
file, and some are prefixed with “[Henry]” to indicate that the pattern should work on the King
Henry V data file. Other patterns are for other data.
-
[words] I am trying to solve a crossword puzzle clue; it contains exactly five letters, the first letter is
“A” and the middle (3rd) letter is “P”. What are the possible words that could go there?
-
[words] Find all single words, of any length, that contain exactly one vowel letter (a, e, i, o, u).
-
[words] Match the word “hello” and nothing else.
-
[words] Find all words that begin with the letter “o” and end with the letter “n”; case
does NOT matter.
-
[words] Find all words that contain “time”.
-
[Henry] Find the first line of each part spoken by the Chorus. Look at the text carefully to figure out how to
identify such lines.
-
[Henry] Find every line where “France” is the last word of a sentence; sentences end
with “.” in this text.
-
[Henry] I am looking for the famous line from this play, but all I can remember is that it contains the words
“band” and “brothers”. Please help me find that line!
-
Search through a Python script (use one of your own) and print out all lines that are comments. If a line
contains real code and then a comment, do not print it — just whole lines that are a
comment and nothing else. Hint: Do not forget about indented lines.
-
Match valid 10-digit
North American Numbering
Plan telephone numbers. What pattern(s) are you trying to find? (There are many, pick one or a few.)