Young Wu's Homepage

# Lecture Note

📗 Slides

Lecture 3: Slides, With Quiz
Lecture 4: Slides, With Quiz
Annotated Lecture 3 Section 1: Slides
Annotated Lecture 4 Section 1: Slides
Annotated Week 1 Section 2: Part I, Part II

📗 Websites

(from week 1) Gradient Descent. Link
Neural Network: Link
Neural Network Videos by Grant Sanderson: Playlist (Thanks Dan Drake for the recommendation)
Stochastic Gradient Descent: Link
Overfitting: Link

📗 YouTube videos

How to construct XOR network? Link
How derive 2-layer neural network gradient descent step? Link
How derive multi-layer neural network gradient descent induction step? Link
Comparison between L1 and L2 regularization. Link
Example (Quiz): Cross validation accuracy Link

# Written (Math) Problems

Submit on Canvas: PDF
Please submit a file named "comments.txt", and in the first line, a numerical grade 1, 1.5, or 2 for your whole homework (not individual questions).
An example of Q3 (induction) is done in the "multi-layer neural network" video under Lectures -> YouTube videos. Try to work out the general case for arbitrary w^(l)_ij.

# Programming Problem

📗 Short Instruction

(0) (Optional) Start by building a two-layer neural network (with a single hidden layer, number of hidden units = number of input units) on the training data from week 1 (handwritten digits, you can select a smaller subset of around 400 images to make the training process faster), and test it on the test sets to make sure your neural network works.
(1) Download the FEI Face Database from FEI. Download one of the aligned or normalized set of images (both part 1 and part 2): any of the three is okay. 400 images in total.
(2) Resize the images (36 x 26 should be enough, you can use large or smaller images), compute the pixel intensities, and store them in a vector (one row of matrix x). The vector y is whether the facial expression is not happy (#a.jpg) or happy (#b.jpg).
(3) Split the dataset into a training set and a test set based on your wisc ID:
Type in your ID:
Your test set contains the images:

Your training set contains the remaining 360 images.

(4) Train a two-layer neural network (with a single hidden layer) to classify whether the facial expression in the image is happy or not. The number of hidden units should be equal to the number of input units. You can use any cost function and activation function. The ones use in the lecture slides and the hint file is the squared error cost function plus the sigmoid (logistic) activation function. You can use batch, mini-batch, or stochastic gradient descent, whichever is faster.
(*) You are not allowed to use machine learning packages such as scikit-learn, OpenCV, Keras, PyTorch ...

📗 Files to submit

(1) hidden.png or hidden.jpg etc. shows the hidden layer activation for the first image in your test set.
Input the image size (height x width): x
Input the activation values (comma seperated, each number between 0 and 1, right click to save the png image file):

(2) output.txt contains the classification of the expression in your test set. They should be 40 lines of 0s and 1s, one number per line. If your classification is perfect, the output should be 0, 1, 0, 1 ..., one per line.
(3) comments.txt contains information on how to run your program, in particular, the names of the data files are required.
(4) code.

📗 Things to try

(1) Experiment with different hyperparameters.
(2) Repeat the experiments with different (random) initial weights.
(3) (Not required) Try L1 or L2 regularizers.
(4) Find and look at the images that are classified incorrectly.

📗 Longer Instruction

More (nonessential) details and hints: PDF.
Shang and Erik posted their code to convert images to csv files on Piazza: in Python written by Shang and in Java written by Erik. It's okay to use the code to preprocess the images, but make sure you mention it in the comments file that you are using preprocessed data. Also, please do NOT copy and submit their code!

📗 TAs' Solution

(1) Java: Link written by Tan
(2) Python: Link written by Dandi
The expressions for activation and cost functions are removed from both solutions since you can choose any one you like: please see the formulas in the hints file PDF if you want squared error cost with logistic activation.
Important note: You are not allowed to copy any code from the solution. MOSS will be used check for code similarity: changing just variable names and the spacing etc is still considered cheating. You can read and learn what the solution is doing but you MUST write all code yourself. The deadline for resubmission without 50 percent penalty is June 30.

Last Updated: November 09, 2021 at 1:05 AM