# Lecture
📗 The lecture is in person, but you can join Zoom:
8:50-9:40 or
11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.
📗 The in-class (participation) quizzes should be submitted on
TopHat (Code:741565), but you can submit your answers through
Form at the end of the lectures too.
📗 The Python notebooks used during the lectures can also be found on:
GitHub. They will be updated weekly.
# Lecture Notes
TopHat Game
➩ There will be 20 questions on the exam, 10 of them from past exams and quizzes, and 10 of them new questions (see
Link for details). I will post \(n\) more questions next Monday that are identical or similar to \(n\) of the new questions on exam.
➩ A: \(n = 0\)
➩ B: \(n = 1\) if more than 50 percent of you choose B.
➩ C: \(n = 2\) if more than 75 percent of you choose C.
➩ D: \(n = 3\) if more than 95 percent of you choose D.
➩ E: \(n = 0\)
📗 Regular Expressions
➩ Regular Expressions (Regex) is a small language for describing patterns to search for and it is used in many different programming languages:
Link.
➩ Python re
package has re.findall(regex, string)
and re.sub(regex, replace, string)
to find and replace parts of string
using the pattern regex
.
Raw Strings
➩ Python uses escape characters to represent special characters.
➩ Raw strings r"..."
starts with r
and do not convert escape characters into special characters.
➩ It is usually easier to specify regex with raw strings.
Code |
Character |
Note |
\" |
double quote |
- |
\' |
single quote |
- |
\\ |
backslash |
"\\\"" displays \" |
\n |
new line |
- |
\r |
carriage return |
(not used often) |
\t |
tab |
- |
\b |
backspace |
(similar to left arrow key) |
Raw String Examples
➩ Which of the following are true?
➩ "\t" == r" "
is true.
➩ "\\t" == r"\t"
is true.
➩ "\\\t" == r"\ "
is true.
➩ "\\\\t" == r"\\t"
is true.
➩ "A\\B\\C" == r"A\B\C"
is true.
Meta Characters
➩ Some characters have special functions in a regex, and they are called meta characters.
Meta character |
Meaning |
Example |
Meaning |
. |
any character except for \n |
- |
- |
[] |
any character inside brackets |
[abc] |
a or b or c |
[^ ] |
any character not inside brackets |
[^abc] |
not one of a b c |
* |
zero or more of last symbol |
a* |
zero or more a |
+ |
one or more of last symbol |
a+ |
one or more a |
? |
zero or one of last symbol |
a? |
zero or one a |
{ } |
exact number of last symbol |
a{3} |
exactly aaa |
{ , } |
number of last symbol in range |
a{1, 3} |
a or aa or aaa |
| |
either before or after bar |
ab|bc |
either ab or bc |
\ |
escapes next metacharacter |
\? |
literal ? |
^ |
beginning of line |
^a |
begins with a |
$ |
end of line |
a$ |
ends with a |
Shorthand
➩ Some escape characters are used as shorthand.
Shorthand |
Meaning |
Bracket Form |
\d |
digit |
[0-9] |
\D |
not digit |
[^0-9] |
\w |
alphanumeric character |
[a-zA-Z0-9] |
\W |
not alphanumeric character |
[^a-zA-Z0-9] |
\s |
white space |
[\t\n\r] |
\S |
not white space |
[^\t\n\r] |
Regex Find Examples
➩ Get the git log of the CS320-FA23 repo and find the commit numbers, emails and dates and times of the commits:
Link.
📗 Capture Group
➩ re.findall
can return a list of substrings or list of tuples of substrings using capturing groups inside (...)
, for example, the regex ...(x)...(y)...(z)...
returns a list of tuples with three elements matching x
, y
, z
.
➩ re.sub
can replaces the matches by another string, the captured groups can be used in the replacement string by \g<1>
, \g<2>
, \g<3>
, ..., for example, replace ...(x)...(y)...(z)...
by \g<2>\g<3>\g<1>
will return yzx
.
Regex Sub Examples
➩ Replace the dates in the git log using the day-month-year then time format.
Practice Regex
➩ Useful references for writing regex:
Link,
PDF.
➩ Use Large Language Models such as ChatGPT to generate regex: ChatGPT
Link, Bard
Link, Bing
Link. Also see prompt engineering guide:
Link.
Notes and code adapted from the course taught by Yiyin Shen
Link and Tyler Caraza-Harter
Link