Prev: L25, Next: L27

# Lecture

📗 The lecture is in person, but you can join Zoom: 8:50-9:40 or 11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.
📗 The in-class (participation) quizzes should be submitted on TopHat (Code:741565), but you can submit your answers through Form at the end of the lectures too.
📗 The Python notebooks used during the lectures can also be found on: GitHub. They will be updated weekly.


# Lecture Notes

TopHat Game ➭ There will be 20 questions on the exam, 10 of them from past exams and quizzes, and 10 of them new questions (see Link for details). I will post \(n\) more questions next Monday that are identical or similar to \(n\) of the new questions on exam.
➭ A: \(n = 0\)
➭ B: \(n = 1\) if more than 50 percent of you choose B.
➭ C: \(n = 2\) if more than 75 percent of you choose C.
➭ D: \(n = 3\) if more than 95 percent of you choose D.
➭ E: \(n = 0\)

📗 Regular Expressions
➭ Regular Expressions (Regex) is a small language for describing patterns to search for and it is used in many different programming languages: Link.
➭ Python re package has re.findall(regex, string) and re.sub(regex, replace, string) to find and replace parts of string using the pattern regex.



📗 Raw Strings
➭ Python uses escape characters to represent special characters.
➭ Raw strings r"..." starts with r and do not convert escape characters into special characters.
➭ It is usually easier to specify regex with raw strings.

Code Character Note
\" double quote -
\' single quote -
\\ backslash "\\\"" displays \"
\n new line -
\r carriage return (not used often)
\t tab -
\b backspace (similar to left arrow key)


Raw String Examples ➭ Which of the following are true?
"\t" == r" " is true.
"\\t" == r"\t" is true.
"\\\t" == r"\ " is true.
"\\\\t" == r"\\t" is true.
"A\\B\\C" == r"A\B\C" is true.



📗 Meta Characters
➭ Some characters have special functions in a regex, and they are called meta characters.

Meta character Meaning Example Meaning
. any character except for \n - -
[] any character inside brackets [abc] a or b or c
[^ ] any character not inside brackets [^abc] not one of a b c
* zero or more of last symbol a* zero or more a
+ one or more of last symbol a+ one or more a
? zero or one of last symbol a? zero or one a
{ } exact number of last symbol a{3} exactly aaa
{ , } number of last symbol in range a{1, 3} a or aa or aaa
| either before or after bar ab|bc either ab or bc
\ escapes next metacharacter \? literal ?
^ beginning of line ^a begins with a
$ end of line a$ ends with a




📗 Shorthand
➭ Some escape characters are used as shorthand.

Shorthand Meaning Bracket Form
\d digit [0-9]
\D not digit [^0-9]
\w alphanumeric character [a-zA-Z0-9]
\W not alphanumeric character [^a-zA-Z0-9]
\s white space [\t\n\r]
\S not white space [^\t\n\r]


Regex Find Examples ➭ Get the git log of the CS320-FA23 repo and find the commit numbers, emails and dates and times of the commits: Link.
➭ Code to make the plots: Notebook.

📗 Capture Group
re.findall can return a list of substrings or list of tuples of substrings using capturing groups inside (...), for example, the regex ...(x)...(y)...(z)... returns a list of tuples with three elements matching x, y, z.
re.sub can replaces the matches by another string, the captured groups can be used in the replacement string by \g<1>, \g<2>, \g<3>, ..., for example, replace ...(x)...(y)...(z)... by \g<2>\g<3>\g<1> will return yzx

Regex Sub Examples ➭ Replace the dates in the git log using the day-month-year then time format.
➭ Code to create the strings: Notebook.



📗 Practice Regex
➭ Useful references for writing regex: Link, PDF.
➭ Use Large Language Models such as ChatGPT to generate regex: ChatGPT Link, Bard Link, Bing Link. Also see prompt engineering guide: Link.
➭ Try things here: Link or Link.




📗 Notes and code adapted from the course taught by Yiyin Shen Link and Tyler Caraza-Harter Link






Last Updated: April 29, 2024 at 1:10 AM