Prev: L32, Next: L34

# Lecture

📗 The lecture is in person, but you can join Zoom: 8:50-9:40 or 11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.
📗 The in-class (participation) quizzes should be submitted on TopHat (Code:741565), but you can submit your answers through Form at the end of the lectures too.
📗 The Python notebooks used during the lectures can also be found on: GitHub. They will be updated weekly.


# Lecture Notes

📗 Nonlinear Classifiers
➩ Non-linear classifiers are classifiers with non-linear decision boundaries.
➩ Non-linear models are difficult to estimate directly in general.
➩ Two ways of creating non-linear classifiers are,
(1) Non-linear transformations of the features (for example, kernel support vector machine) 
(2) Combining multiple copies of linear classifiers (for example, neural network, decision tree)

 Sklearn Pipeline
➩ New features can be constructed manually, or through using transformers provided by sklearn in a sklearn.Pipeline.
➩ A categorical column can be converted to multiple columns using sklearn.preprocessing.OneHotEncoder: Doc.
➩ A numerical column can normalized to center at 0 with variance 1 using sklearn.preprocessing.StandardScaler: Doc.
➩ Additional columns including powers of one column can be added using sklearn.preprocessing.PolynomialFeatures: Doc.
➩ Bag of words features can TF-IDF features can be added using feature_extraction.text.CountVectorizer and feature_extraction.text.TfidfVectorizer.

Pipeline Example
➩ Predict whether income exceeds 50K per year based on census data: Link.
➩ Code for creating the pipeline: Notebook.

📗 Kernel Trick
sklearn.svm.SVC can be used to train kernel SVMs, possibly infinite number of new features, efficiently through dual optimization (more detail about this in the Linear Programming lecture): Doc
➩ Available kernel functions include: linear (no new features), polynomial (degree d polynomial features), rbf (Radial Basis Function, infinite number of new features).

Kernel Trick Example ID:


 Neural Network
➩ Neural networks (also called multilayer perceptron) can be viewed as multiple layers of logistic regressions (or perceptrons with other activation functions).
➩ The outputs of the previous layers are used as the inputs in the next layer.
➩ The layers in between the inputs \(x\) and output \(y\) are hidden layers and can be viewed as additional internal features generated by the neural network.

📗 Sklearn vs PyTorch
sklearn.neural_network.MLPClassifier can be used to train fully connect neural networks without convolutional layers or transformer modules. The activation functions logistic, tanh, and relu can be used: Doc
PyTorch is a popular package for training more general neural networks with special layers and modules, and with custom activation functions: Link.

TopHat Activity
➩ Compare neural networks with different architecture (number of hidden layers, units), and different activation functions (ReLU, tanh, Sigmoud (logistic)) here: Link.
➩ Discuss how they behave differently on different datasets, for example, training speed, decision boundary, number of non-zero weights, etc. 

📗 Model Selection
➩ Many non-linear classifiers can overfit the training data perfectly: Link.
➩ Comparing prediction accuracy of these classifiers on the training set is not meaningful.
➩ Cross validation can be used to compare and select classifiers with different parameters, for example, the neural network architecture, activation functions, or other training parameters.
➩ The dataset is split into K subsets, called K folds, and each fold is used as the test set while the remaining K - 1 folds are used to train.
➩ The average accuracy from the K folds can be used as a measure of the classification accuracy on the training set.
➩ If \(K = n\), then there is only one item in each fold, and the cross validation procedure in this case is called Leave-One-Out Cross Validation (LOOCV).

Cross Validation Example
➩ Compare a neural network with two hidden layers and an RBF kernel SVM on a simple 2D dataset using cross validation accuracy.
➩ Code for cross validation: Notebook.
➩ Higher mean (average) CV sore with lower variance or standard deviation is preferred.

Additional Examples
➩ In a neural network with 4 input features, 3 units in the first hidden layer, 2 units in the second hidden layer, and 1 unit for binary classification in the output layer, how many weights and biases does the network have?
➩ Suppose the activation functions are logistic (other activation functions do not change the answer to this questions), then:
(1) In the first layer, there are 3 logistic regressions with 4 features, meaning there are 12 weights and 3 biases.
(2) In the second layer, there are 2 logistic regressions with 3 features (3 units from the previous layer), meaning there are 6 weights and 2 biases.
(3) In the last layer, there is 1 logistic regression with 2 features (2 units from the previous layer), meaning there are 2 weights and 1 bias.
➩ Therefore, there are 12 + 6 + 1 = 19 weights and 3 + 2 + 1 = 6 biases in the network.


➩ Transform the points (using the kernel) and move the plane such that the plane separates the two classes.

Kernel: 0
Plane: 0


 Notes and code adapted from the course taught by Yiyin Shen Link and Tyler Caraza-Harter Link






Last Updated: November 30, 2024 at 4:34 AM