Prev:
W8 , Next:
W10
Zoom:
Link , TopHat:
Link (936525), GoogleForm:
Link , Piazza:
Link , Feedback:
Link , GitHub:
Link , Sec1&2:
Link
Slide:
Go All Prev Next
# Slides and Notes
📗 From sections 1 and 2:
# Regular Expressions
📗 Regular Expressions (Regex) is a small language for describing patterns to search for and it is used in many different programming languages:
Link .
➩ Python re package has re.findall(regex, string) and re.sub(regex, replace, string) to find and replace parts of string using the pattern regex.
# Raw Strings
📗 Python uses escape characters to represent special characters.
➩ Raw strings r"..." starts with r and do not convert escape characters into special characters.
➩ It is usually easier to specify regex with raw strings.
Code
Character
Note
\"
double quote
-
\'
single quote
-
\\
backslash
"\\\"" displays \"
\n
new line
-
\r
carriage return
(not used often)
\t
tab
-
\b
backspace
(similar to left arrow key)
📗 Raw String Examples
➩ "\t" == r" " is true.
➩ "\\t" == r"\t" is true.
➩ "\\\t" == r"\ " is true.
➩ "\\\\t" == r"\\t" is true.
➩ "A\\B\\C" == r"A\B\C" is true.
# Meta Characters
📗 Some characters have special functions in a regex, and they are called meta characters.
Meta character
Meaning
Example
Meaning
.
any character except for \n
-
-
[]
any character inside brackets
[abc]
a or b or c
[^ ]
any character not inside brackets
[^abc]
not one of a b c
*
zero or more of last symbol
a*
zero or more a
+
one or more of last symbol
a+
one or more a
?
zero or one of last symbol
a?
zero or one a
{ }
exact number of last symbol
a{3}
exactly aaa
{ , }
number of last symbol in range
a{1, 3}
a or aa or aaa
|
either before or after bar
ab|bc
either ab or bc
\
escapes next metacharacter
\?
literal ?
^
beginning of line
^a
begins with a
$
end of line
a$
ends with a
# Greedy, Lazy, Possessive
📗 Regex quantifiers have three "personalities":
Type
Example
Note
Greedy
*, +, ?, {m, n}
Match as much as possible, but allow backtracking
Lazy
*?, +?, ??, {m, n}?
Match as little as possible
Possessive
*+, ++, ?+, {m, n}+
Match as much as possible, no backtracking
# Shorthand
📗 Some escape characters are used as shorthand.
Shorthand
Meaning
Bracket Form
\d
digit
[0-9]
\D
not digit
[^0-9]
\w
alphanumeric character
[a-zA-Z0-9]
\W
not alphanumeric character
[^a-zA-Z0-9]
\s
white space
[\t\n\r]
\S
not white space
[^\t\n\r]
\b
word boundary
between \w and \W
# Capture Group
📗 re.findall can return a list of substrings or list of tuples of substrings using capturing groups inside (...), for example, the regex ...(x)...(y)...(z)... returns a list of tuples with three elements matching x, y, z.
➩ re.sub can replaces the matches by another string, the captured groups can be used in the replacement string by \g<1>, \g<2>, \g<3>, ..., for example, replace ...(x)...(y)...(z)... by \g<2>\g<3>\g<1> will return yzx.
# Practice Regex
📗 Useful references for writing regex:
Link ,
PDF .
➩ Use Large Language Models to generate regex. Also see prompt engineering guide:
Link .
# Questions?
test q
📗 Notes and code adapted from the course taught by Professors Gurmail Singh, Yiyin Shen, Tyler Caraza-Harter.
📗 If there is an issue with TopHat during the lectures, please submit your answers on paper (include your Wisc ID and answers) or this Google form
Link at the end of the lecture.
📗 Anonymous feedback can be submitted to:
Form . Non-anonymous feedback and questions can be posted on Piazza:
Link
Prev:
W8 , Next:
W10
Last Updated: March 31, 2026 at 12:33 AM