Prev: W1 Next: W3

# Summary

📗 Monday lecture: 5:30 to 8:30, Zoom Link
📗 Office hours: 5:30 to 8:30 Wednesdays (Dune) and Thursdays (Zoom Link)
📗 Personal meeting room: always open, Zoom Link
📗 Quiz (use your wisc ID to log in (without "@wisc.edu")): Socrative Link, Regrade request form: Google Form (select Q2).
📗 Math Homework:
M1, M2,
📗 Programming Homework:
P1,
📗 Examples, Quizzes, Discussions:
Q2,

# Lectures

📗 Slides (before lecture, usually updated on Saturday):
Blank Slides: Part 1, Part 2,
Blank Slides (with blank pages for quiz questions): Part 1, Part 2,
📗 Slides (after lecture, usually updated on Tuesday):
Blank Slides with Quiz Questions: Part 1, Part 2,
Annotated Slides: Part 1, Part 2,
📗 My handwriting is really bad, you should copy down your notes from the lecture videos instead of using these.

📗 Notes
Train

Image by Vishal Arora via medium
N/A

# Other Materials

📗 Pre-recorded Videos from 2020
Part 1 (Neural Network): Link
Part 2 (Backpropogation): Link
Part 3 (Multi-Layer Network): Link
Part 4 (Stochastic Gradient): Link
Part 5 (Multi-Class Classification): Link
Part 6 (Regularization): Link

📗 Relevant websites
Neural Network: Link
Another Neural Network Demo: Link
Neural Network Videos by Grant Sanderson: Playlist
MNIST Neural Network Visualization: Link
Neural Network Simulator: Link
Overfitting: Link
Neural Network Snake: Video
Neural Network Car: Video
Neural Network Flappy Bird: Video
Neural Network Mario: Video
MyScript: algorithm Link demo Link
Maple Calculator: Link


📗 YouTube videos from 2019 to 2021
How to construct XOR network? Link
How derive 2-layer neural network gradient descent step? Link
How derive multi-layer neural network gradient descent induction step? Link
Comparison between L1 and L2 regularization. Link
Example (Quiz): Cross validation accuracy Link



# Keywords and Notations

📗 Neural Network:
Neural network classifier for two layer network with logistic activation: y^i=1{ai(2)0.5}
aij(1)=11+exp(((j=1mxijwjj(1))+bj(1))), where m is the number of features (or input units), wjj(1) is the layer 1 weight from input unit j to hidden layer unit j, bj(1) is the bias for hidden layer unit j, aij(1) is the layer 1 activation of instance i hidden unit j.
ai(2)=11+exp(((j=1haij(1)wj(2))+b(2))), where h is the number of hidden units, wj(2) is the layer 2 weight from hidden layer unit j, b(2) is the bias for the output unit, ai(2) is the layer 2 activation of instance i.
Stochastic gradient descent step for two layer network with squared loss and logistic activation:
wjj(1)=wjj(1)α(ai(2)yi)ai(2)(1ai(2))wj(2)aij(1)(1aij(1))xij.
bj(1)bj(1)α(ai(2)yi)ai(2)(1ai(2))wj(2)aij(1)(1aij(1)).
wj(2)wj(2)α(ai(2)yi)ai(2)(1ai(2))aij(1).
b(2)b(2)α(ai(2)yi)ai(2)(1ai(2)).

📗 Multiple Classes:
Softmax activation for one layer networks: aij=exp((wkxi+bk))k=1Kexp((wkxi+bk)), where K is the number of classes (number of possible labels), aik is the activation of the output unit k for instance i, yik is component k of the one-hot encoding of the label for instance i.

📗 Regularization:
L1 regularization (squared loss): i=1n(aiyi)2+λ(j=1m|wj|+|b|), where λ is the regularization parameter.
L2 regularization (sqaured loss): i=1n(aiyi)2+λ(j=1m(wj)2+b2).







Last Updated: April 09, 2025 at 11:28 PM