# Other Materials
📗 Pre-recorded Videos from 2020
Part 1 (Supervised learning):
Link
Part 2 (Perceptron learning):
Link
Part 3 (Loss functions):
Link
Part 4 (Logistic regression):
Link
Part 5 (Convexity):
Link
📗 Relevant websites
Which face is real?
Link
This X does not exist:
Link
Turtle or Rifle:
Link
Art or garbage game:
Link
Guess two-thirds of the average?
Link
Gradient Descent:
Link
Optimization:
Link
Neural Network:
Link
Generative Adversarial Net:
Link
📗 YouTube videos from 2019 to 2021
Why does the (batch) perceptron algorithm work?
Link
Why cannot use linear regression for binary classification?
Link
Why does gradient descent work?
Link
How to derive logistic regression gradient descent step formula?
Link
Example (Quiz): Perceptron update formula
Link
Example (Quiz): Gradient descent for logistic activation with squared error
Link
Example (Quiz): Computation of Hessian of quadratic form
Link
Example (Quiz): Computation of eigenvalues
Link
Example (Homework): Gradient descent for linear regression
Link
Video going through M1:
Link
📗 Math and Statistics Review
Checklist:
Link, "math crib sheet":
Link
Multivariate Calculus:
Textbook, Chapter 16 and/or (Economics)
Tutorials, Chapters 2 and 3.
Linear Algebra:
Textbook, Chapters on Determinant and Eigenvalue.
Probability and Statistics:
Textbook, Chapters 3, 4, 5.
# Keywords and Notations
📗 Supervised Learning:
Training item: \(\left(x_{i}, y_{i}\right)\), where \(i \in \left\{1, 2, ..., n\right\}\) is the instance index, \(x_{ij}\) is the feature \(j\) of instance \(i\), \(j \in \left\{1, 2, ..., m\right\}\) is the feature index, \(x_{i} = \left(x_{i1}, x_{i2}, ...., x_{im}\right)\) is the feature vector of instance \(i\), and \(y_{i}\) is the true label of instance \(i\).
Test item: \(\left(x', y'\right)\), where \(j \in \left\{1, 2, ..., m\right\}\) is the feature index.
📗 Linear Threshold Unit, Linear Perceptron:
LTU Classifier: \(\hat{y}_{i} = 1_{\left\{w^\top x_{i} + b \geq 0\right\}}\), where \(w = \left(w_{1}, w_{2}, ..., w_{m}\right)\) is the weights, \(b\) is the bias, \(x_{i} = \left(x_{i1}, x_{i2}, ..., x_{im}\right)\) is the feature vector of instance \(i\), and \(\hat{y}_{i}\) is the predicted label of instance \(i\).
Perceptron algorithm update step: \(w = w - \alpha \left(a_{i} - y_{i}\right) x_{i}\), \(b = b - \alpha \left(a_{i} - y_{i}\right)\), \(a_{i} = 1_{\left\{w^\top x_{i} + b \geq 0\right\}}\), where \(a_{i}\) is the activation value of instance \(i\).
📗 Loss Function:
Zero-one loss minimization: \(\hat{f} = \mathop{\mathrm{argmin}}_{f \in \mathcal{H}} \displaystyle\sum_{i=1}^{n} 1_{\left\{f\left(x_{i}\right) \neq y_{i}\right\}}\), where \(\hat{f}\) is the optimal classifier, \(\mathcal{H}\) is the hypothesis space (set of functions to choose from).
Squared loss minimization of perceptrons: \(\left(\hat{w}, \hat{b}\right) = \mathop{\mathrm{argmin}}_{w, b} \dfrac{1}{2} \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\), \(a_{i} = g\left(w^\top x_{i} + b\right)\), where \(\hat{w}\) is the optimal weights, \(\hat{b}\) is the optimal bias, \(g\) is the activation function.
📗 Logistic Regression:
Logistic regression classifier: \(\hat{y}_{i} = 1_{\left\{a_{i} \geq 0.5\right\}}\), \(a_{i} = \dfrac{1}{1 + \exp\left(- \left(w^\top x_{i} + b\right)\right)}\).
Loss minimization problem: \(\left(\hat{w}, \hat{b}\right) = \mathop{\mathrm{argmin}}_{w, b} -\displaystyle\sum_{i=1}^{n} \left(y_{i} \log\left(a_{i}\right) + \left(1 - y_{i}\right) \log\left(1 - a_{i}\right)\right)\), \(a_{i} = \dfrac{1}{1 + \exp\left(- \left(w^\top x_{i} + b\right)\right)}\).
Batch gradient descrent step: \(w = w - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right) x_{i}\), \(b = b - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)\), \(a_{i} = \dfrac{1}{1 + \exp\left(- \left(w^\top x_{i} + b\right)\right)}\), where \(\alpha\) is the learning rate.