Prev: W1 Next: W3

# Summary

πŸ“— Tuesday to Friday lectures: 1:00 to 2:15, Zoom Link
πŸ“— Saturday review sessions: 5:30 to 8:30, Zoom Link
πŸ“— Personal meeting room: always open, Zoom Link
πŸ“— Quiz (use your wisc ID to log in (without "@wisc.edu")): Socrative Link
πŸ“— Math Homework:
M4, M5,
πŸ“— Programming Homework:
P2,
πŸ“— Examples and Quizzes:
Q5, Q6, Q7, Q8,

# Lectures

πŸ“— Slides (before lecture, usually updated on Sunday):
Blank Slides: Part 1, Part 2, Part 3, Part 4,
Blank Slides (with blank pages for quiz questions): Part 1, Part 2, Part 3, Part 4,
πŸ“— Slides (after lecture, usually updated on Friday):
Blank Slides with Quiz Questions: Part 1, Part 2, Part 3, Part 4,
Annotated Slides: Part 1, Part 2, Part 3, Part 4,
πŸ“— Review Session:
PDF.

πŸ“— My handwriting is really bad, you should copy down your notes from the lecture videos instead of using these.

πŸ“— Notes
Train

Image by Vishal Arora via medium


# Other Materials

πŸ“— Pre-recorded Videos from 2020
Lecture 5 Part 1 (Support Vector Machines): Link
Lecture 5 Part 2 (Subgradient Descent): Link
Lecture 5 Part 3 (Kernel Trick): Link
Lecture 6 Part 1 (Decision Tree): Link
Lecture 6 Part 2 (Random Forrest): Link
Lecture 6 Part 3 (Nearest Neighbor): Link
Lecture 7 Part 1 (Convolution): Link
Lecture 7 Part 2 (Gradient Filters): Link
Lecture 7 Part 3 (Computer Vision): Link
Lecture 8 Part 1 (Computer Vision): Link
Lecture 8 Part 2 (Viola Jones): Link
Lecture 8 Part 3 (Convolutional Neural Net): Link

πŸ“— Relevant websites
Support Vector Machine: Link
RBF Kernel SVM Demo: Link

Decision Tree: Link
Random Forrest Demo: Link

K Nearest Neighbor: Link
Map of Manhattan: Link
Voronoi Diagram: Link
KD Tree: Link

Image Filter: Link
Canny Edge Detection: Link
SIFT: PDF
HOG: PDF
Conv Net on MNIST: Link
Conv Net Vis: Link
LeNet: PDF, Link
Google Inception Net: PDF
CNN Architectures: Link
Image to Image: Link
Image segmentation: Link
Image colorization: Link, Link 
Image Reconstruction: Link
Style Transfer: Link
Move Mirror: Link
Pose Estimation: Link
YOLO Attack: YouTube


πŸ“— YouTube videos from 2019 and 2020
How to find the margin expression for SVM? Link
Why does the kernel trick work? Link
Example (Quiz): Compute SVM classifier Link
Example (Quiz): Kernel SVM for XOR operator Link
Example (Quiz): Kernel matrix to feature vector Link
Example (Quiz): Entropy computation Link
Example (Quiz): Decision tree for implication operator Link
Example (Quiz): Three nearest neighbor Link
How to find the HOG features? Link
How to count the number of weights for training for a convolutional neural network (LeNet)? Link
Example (Quiz): How to find the 2D convolution between two matrices? Link
Example (Homework): How to find a discrete approximate Gausian filter? Link


# Kernel Demo



Data type: , Count:
Kernel type:
Kernel:
Plane:

# Keywords and Notations

πŸ“— Support Vector Machine
SVM classifier: y^i=1{w⊀xi+bβ‰₯0}.
Hard margin, original max-margin formulation: maxw2w⊀w such that w⊀xi+bβ‰€βˆ’1 if yi=0 and w⊀xi+bβ‰₯1 if yi=1.
Hard margin, simplified formulation: minw12w⊀w such that (2yiβˆ’1)(w⊀xi+b)β‰₯1.
Soft margin, original max-margin formulation: minw12w⊀w+1Ξ»1nβˆ‘i=1nΞΎi such that (2yiβˆ’1)(w⊀xi+b)β‰₯1βˆ’ΞΎ,ΞΎβ‰₯0, where ΞΎi is the slack variable for instance i, Ξ» is the regularization parameter.
Soft margin, simplified formulation: minwΞ»2w⊀w+1nβˆ‘i=1nmax{0,1βˆ’(2yiβˆ’1)(w⊀xi+b)}
Subgradient descent formula: w=(1βˆ’Ξ»)wβˆ’Ξ±(2yiβˆ’1)1{(2yiβˆ’1)(w⊀xi+b)β‰₯1}xi.

πŸ“— Kernel Trick
Kernel SVM classifier: y^i=1{wβŠ€Ο†(xi)+bβ‰₯0}, where Ο† is the feature map.
Kernal Gram matrix: Kiiβ€²=Ο†(xi)βŠ€Ο†(xiβ€²).
Quadratic Kernel: Kiiβ€²=(xi⊀xiβ€²+1)2 has feature representation Ο†(xi)=(xi12,xi22,2xi1xi2,2xi1,2xi2,1).
Gaussian RBF Kernel: Kiiβ€²=exp⁑(βˆ’12Οƒ2(xiβˆ’xiβ€²)⊀(xiβˆ’xiβ€²)) has infinite-dimensional feature representation, where Οƒ2 is the variance parameter.

πŸ“— Information Theory:
Entropy: H(Y)=βˆ’βˆ‘y=1Kpylog2⁑(py), where K is the number of classes (number of possible labels), py is the fraction of data points with label y.
Conditional entropy: H(Y|X)=βˆ’βˆ‘x=1KXpxβˆ‘y=1Kpy|xlog2⁑(py|x), where KX is the number of possible values of feature, px is the fraction of data points with feature x, py|x is the fraction of data points with label y among the ones with feature x.
Information gain, for feature j: I(Y|Xj)=H(Y)βˆ’H(Y|Xj).

πŸ“— Decision Tree:
Decision stump classifier: y^i=1{xijβ‰₯tj}, where tj is the threshold for feature j.
Feature selection: j⋆=argmaxj⁑I(Y|Xj).

πŸ“— Convolution
Convolution (1D): a=x⋆w, aj=βˆ‘t=βˆ’kkwtxjβˆ’t, where w is the filter, and k is half of the width of the filter.
Convolution (2D): A=X⋆W, Ajjβ€²=βˆ‘s=βˆ’kkβˆ‘t=βˆ’kkWs,tXjβˆ’s,jβ€²βˆ’t, where W is the filter, and k is half of the width of the filter.
Sobel filter: Wx=[βˆ’101βˆ’202βˆ’101] and Wy=[βˆ’1βˆ’2βˆ’1000121].
Image gradient: βˆ‡xX=Wx⋆X, βˆ‡yX=Wy⋆X, with gradient magnitude G=βˆ‡x2+βˆ‡y2 and gradient direction Θ=arctan(βˆ‡yβˆ‡x).

πŸ“— Convolutional Neural Network
Fully connected layer: a=g(w⊀x+b), where a is the activation unit, g is the activation function.
Convolution layer: A=g(W⋆X+b), where A is the activation map.
Pooling layer: (max-pooling) a=max{x1,...,xm}, (average-pooling) a=1mβˆ‘j=1mxj.






Last Updated: April 09, 2025 at 11:28 PM