Young Wu's Homepage

Prev: L32, Next: L34

# Lecture

📗 The lecture is in person, but you can join Zoom: 8:50-9:40 or 11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.

📗 The in-class (participation) quizzes should be submitted on TopHat (Code:741565), but you can submit your answers through Form at the end of the lectures too.

📗 The Python notebooks used during the lectures can also be found on: GitHub. They will be updated weekly.

# Lecture Notes

📗 Nonlinear Classifiers

➩ Non-linear classifiers are classifiers with non-linear decision boundaries.

➩ Non-linear models are difficult to estimate directly in general.

➩ Two ways of creating non-linear classifiers are,

(1) Non-linear transformations of the features (for example, kernel support vector machine)
(2) Combining multiple copies of linear classifiers (for example, neural network, decision tree)

Sklearn Pipeline

➩ New features can be constructed manually, or through using transformers provided by sklearn in a sklearn.Pipeline.

➩ A categorical column can be converted to multiple columns using sklearn.preprocessing.OneHotEncoder: Doc.

➩ A numerical column can normalized to center at 0 with variance 1 using sklearn.preprocessing.StandardScaler: Doc.

➩ Additional columns including powers of one column can be added using sklearn.preprocessing.PolynomialFeatures: Doc.

➩ Bag of words features can TF-IDF features can be added using feature_extraction.text.CountVectorizer and feature_extraction.text.TfidfVectorizer.

Pipeline Example

➩ Predict whether income exceeds 50K per year based on census data: Link.

➩ Code for creating the pipeline: Notebook.

📗 Kernel Trick

➩ sklearn.svm.SVC can be used to train kernel SVMs, possibly infinite number of new features, efficiently through dual optimization (more detail about this in the Linear Programming lecture): Doc

➩ Available kernel functions include: linear (no new features), polynomial (degree d polynomial features), rbf (Radial Basis Function, infinite number of new features).

Kernel Trick Example

ID:

Neural Network

➩ Neural networks (also called multilayer perceptron) can be viewed as multiple layers of logistic regressions (or perceptrons with other activation functions).

➩ The outputs of the previous layers are used as the inputs in the next layer.

➩ The layers in between the inputs \(x\) and output \(y\) are hidden layers and can be viewed as additional internal features generated by the neural network.

📗 Sklearn vs PyTorch

➩ sklearn.neural_network.MLPClassifier can be used to train fully connect neural networks without convolutional layers or transformer modules. The activation functions logistic, tanh, and relu can be used: Doc

➩ PyTorch is a popular package for training more general neural networks with special layers and modules, and with custom activation functions: Link.

TopHat Activity

➩ Compare neural networks with different architecture (number of hidden layers, units), and different activation functions (ReLU, tanh, Sigmoud (logistic)) here: Link.

➩ Discuss how they behave differently on different datasets, for example, training speed, decision boundary, number of non-zero weights, etc.

📗 Model Selection

➩ Many non-linear classifiers can overfit the training data perfectly: Link.

➩ Comparing prediction accuracy of these classifiers on the training set is not meaningful.

➩ Cross validation can be used to compare and select classifiers with different parameters, for example, the neural network architecture, activation functions, or other training parameters.

➩ The dataset is split into K subsets, called K folds, and each fold is used as the test set while the remaining K - 1 folds are used to train.

➩ The average accuracy from the K folds can be used as a measure of the classification accuracy on the training set.

➩ If \(K = n\), then there is only one item in each fold, and the cross validation procedure in this case is called Leave-One-Out Cross Validation (LOOCV).

Cross Validation Example

➩ Compare a neural network with two hidden layers and an RBF kernel SVM on a simple 2D dataset using cross validation accuracy.

➩ Code for cross validation: Notebook.

➩ Higher mean (average) CV sore with lower variance or standard deviation is preferred.

Additional Examples

➩ In a neural network with 4 input features, 3 units in the first hidden layer, 2 units in the second hidden layer, and 1 unit for binary classification in the output layer, how many weights and biases does the network have?

➩ Suppose the activation functions are logistic (other activation functions do not change the answer to this questions), then:

(1) In the first layer, there are 3 logistic regressions with 4 features, meaning there are 12 weights and 3 biases.
(2) In the second layer, there are 2 logistic regressions with 3 features (3 units from the previous layer), meaning there are 6 weights and 2 biases.
(3) In the last layer, there is 1 logistic regression with 2 features (2 units from the previous layer), meaning there are 2 weights and 1 bias.

➩ Therefore, there are 12 + 6 + 1 = 19 weights and 3 + 2 + 1 = 6 biases in the network.

➩ Transform the points (using the kernel) and move the plane such that the plane separates the two classes.

Kernel: 0
Plane: 0

Notes and code adapted from the course taught by Yiyin Shen Link and Tyler Caraza-Harter Link

Last Updated: July 01, 2025 at 1:46 AM