Young Wu's Homepage

Prev: W3 Next: W5

# Summary

📗 Monday lecture: 5:30 to 8:30, Zoom Link

📗 Office hours: 5:30 to 8:30 Wednesdays (Dune) and Thursdays (Zoom Link)

📗 Personal meeting room: always open, Zoom Link

📗 Quiz (use your wisc ID to log in (without "@wisc.edu")): Socrative Link, Regrade request form: Google Form (select Q4).

📗 Math Homework:

M4,

📗 Programming Homework:

P2,

📗 Examples, Quizzes, Discussions:

Q4,

# Lectures

📗 Slides (before lecture, usually updated on Saturday):

Blank Slides: Part 1, Part 2,
Blank Slides (with blank pages for quiz questions): Part 1, Part 2,

📗 Slides (after lecture, usually updated on Tuesday):

Blank Slides with Quiz Questions: Part 1, Part 2,
Annotated Slides: Part 1, Part 2,

📗 My handwriting is really bad, you should copy down your notes from the lecture videos instead of using these.

📗 Notes

Image via me.me
N/A

# Other Materials

📗 Pre-recorded Videos from 2020

Part 1 (Generative Models): Link
Part 2 (Natural Language): Link
Part 3 (Sampling): Link
Part 4 (Probability Distribution): Link
Part 5 (Bayesian Network): Link
Part 6 (Network Structure): Link
Part 7 (Naive Bayes): Link

📗 Relevant websites

Zipf's Law: Link
Markov Chain: Link
Google N-Gram: Link

Simple Bayes Net: Link, Link 2
ABNMS: Link, pathfinder: Link

📗 YouTube videos from 2019 to 2021

How to find the HOG features? Link
How to count the number of weights for training for a convolutional neural network (LeNet)? Link
Example (Quiz): How to find the 2D convolution between two matrices? Link
Example (Homework): How to find a discrete approximate Gausian filter? Link

# Keywords and Notations

📗 K-Nearest Neighbor:

Distance: (Euclidean) \(\rho\left(x, x'\right) = \left\|x - x'\right\|_{2} = \sqrt{\displaystyle\sum_{j=1}^{m} \left(x_{j} - x'_{j}\right)^{2}}\), (Manhattan) \(\rho\left(x, x'\right) = \left\|x - x'\right\|_{1} = \displaystyle\sum_{j=1}^{m} \left| x_{j} - x'_{j} \right|\), where \(x, x'\) are two instances.
K-Nearest Neighbor classifier: \(\hat{y}_{i}\) = mode \(\left\{y_{\left(1\right)}, y_{\left(2\right)}, ..., y_{\left(k\right)}\right\}\), where mode is the majority label and \(y_{\left(t\right)}\) is the label of the \(t\)-th closest instance to instance \(i\) from the training set.

📗 Natural Language Processing:

Unigram model: \(\mathbb{P}\left\{z_{1}, z_{2}, ..., z_{d}\right\} = \displaystyle\prod_{t=1}^{d} \mathbb{P}\left\{z_{t}\right\}\) where \(z_{t}\) is the \(t\)-th token in a training item, and \(d\) is the total number of tokens in the item.
Maximum likelihood estimator (unigram): \(\hat{\mathbb{P}}\left\{z_{t}\right\} = \dfrac{c_{z_{t}}}{\displaystyle\sum_{z=1}^{m} c_{z}}\), where \(c_{z}\) is the number of time the token \(z\) appears in the training set and \(m\) is the vocabulary size (number of unique tokens).
Maximum likelihood estimator (unigram, with Laplace smoothing): \(\hat{\mathbb{P}}\left\{z_{t}\right\} = \dfrac{c_{z_{t}} + 1}{\left(\displaystyle\sum_{z=1}^{m} c_{z}\right) + m}\).
Bigram model: \(\mathbb{P}\left\{z_{1}, z_{2}, ..., z_{d}\right\} = \mathbb{P}\left\{z_{1}\right\} \displaystyle\prod_{t=2}^{d} \mathbb{P}\left\{z_{t} | z_{t-1}\right\}\).
Maximum likelihood estimator (bigram): \(\hat{\mathbb{P}}\left\{z_{t} | z_{t-1}\right\} = \dfrac{c_{z_{t-1}, z_{t}}}{c_{z_{t-1}}}\).
Maximum likelihood estimator (bigram, with Laplace smoothing): \(\hat{\mathbb{P}}\left\{z_{t} | z_{t-1}\right\} = \dfrac{c_{z_{t-1}, z_{t}} + 1}{c_{z_{t-1}} + m}\).

Last Updated: July 01, 2025 at 1:48 AM