Young Wu's Homepage

Next: P2

Back to week 1 page: Link
Back to week 2 page: Link

Official Due Date: June 7

# Programming Problem Instruction

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click

📗 The same ID should generate the same set of parameters. Your answers are not saved when you close the browser. You could either copy and paste your console output into the text boxes or print your output to text files (.txt) and load them using the button above the text boxes.

📗 Please report any bugs on Piazza.

# Warning: please enter your ID before you start!

📗 (Introduction) In this project, you will build a logistic regression model and a neural network to classify hand-written digits. Your models should take pixel intensities of images as inputs and output which digits the images display.

📗 (Part 1) Read and download the training set images and labels from MNIST or CSV Files (easier to read) or the same dataset in another format from other places.

📗 (Part 1) Extract the training set data of the digits (label it 0) and (label it 1). Suppose there are \(n\) images in your training set, you should create an \(n \times 784\) feature matrix \(x\) and an \(n \times 1\) vector of labels \(y\). Please rescale so that the feature vectors contain only numbers between 0 and 1. You can do this by dividing all the numbers by 255.

(Hint: the training images contain \(28 \times 28 = 784\) pixels, and each pixel corresponds to an input unit.)

📗 (Part 1) Train a logistic regression on the dataset and plot the weights in a 28 by 28 grid.

📗 (Part 1) Predict the new images in the following test set. The predictions should be one of 0 or 1.

Note: this field may take a few seconds to load. If you downloaded this before May 21, please download it again, line 100 contains a "/n" in place of a "\n".
You can either use the button to download a text file, or copy and paste from the text box into Excel or a csv file. Please do not change the content of the text box.

📗 (Part 2) Train a neural network with one hidden layer. The number of hidden units should be half of the number of input units (here, the number of input units is 784, so the number of hiddens should be 392). The activation function you should use is logistic in both layers.

📗 (Part 2) Predict the new images in the same test set. The predictions should be either 0 or 1.

# Question 1 [1 points]

📗 (training) Enter the feature vector of any one training image (784 numbers, rounded to 2 decimal places, in one line, comma separated):

Plot the image to make sure you entered the vector correctly:

# Question 2 [1 points]

📗 (log_weights) Enter the logistic regression weights and biases (784 + 1 numbers, rounded to 4 decimal places, in one line, comma separated), the bias term should be the last number:

(Note: please do not normalize the weights, ignore this instruction from an earlier incorrect version.)

# Question 3 [10 points]

📗 (log_act) Enter the activation values on the test set (200 numbers between 0 and 1, rounded to 2 decimal places, in one line, comma separated):

# Question 4 [10 points]

📗 (log_pred) Enter the predicted values on the test set (200 integers, 0 or 1, prediction, in one line):

# Question 5 [1 points]

📗 (in_weights) Enter the first layer weights and biases (784 + 1 lines, each line containing 392 numbers, rounded to 4 decimal places, comma separated). The bias terms should be on the last line:

(Hint: for the first 784 lines, line i element j represents the weight from input unit i to hidden unit j, and for the last line, element j represents the bias term for the hidden unit j.)

# Question 6 [1 points]

📗 (out_weights) Enter the second layer weights (392 + 1 numbers, rounded to 4 decimal places, in one line, comma separated). The bias terms should be on the last number:

# Question 7 [10 points]

📗 (nn_act) Enter the second layer activation values on the test set (200 numbers between 0 and 1, rounded to 2 decimal places, in one line, comma separated):

# Question 8 [10 points]

📗 (nn_pred) Enter the predicted values on the test set (200 integers, 0 or 1, prediction, in one line):

# Question 9 [1 points]

📗 (incorrect) Enter the feature vector of one test image that is labelled incorrectly by your network (784 numbers in one line, rounded to 2 decimal places, comma separated). If none of the test set images are labelled incorrectly, you are probably overfitting or you are training on the test set: enter the feature vector that your network is most uncertain of (the second layer activation is the closest to 0.5).

Plot the image:

# Question 10 [1 points]

📗 Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the question that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

📗 Warning: grading may take around 10 to 20 seconds. Please be patient and do not click "Grade" multiple times.

📗 Please copy and paste the text between the *s (not including the *s) and submit it on Canvas, P1.

📗 Please submit your code and outputs on Canvas, P1S.

📗 You could also save your outputs as a single text file using the button and submit this to P1S (with your code).

📗 Warning: the load button does not function properly for all questions, please recheck everything after you load. You could load your answers using the button from the text field:

📗 Saving and loading may take around 10 to 20 seconds. Please be patient and do not click the buttons multiple times.

# Hints and Solutions (frequently updated)

📗 Please do not normalize the weights. An earlier version of the instruction is incorrect.

📗 Use stochastic gradient descent if your algorithm does not converge quickly within 5 minutes.

📗 The algorithms are outlined in the lecture slides: Slides, pages 29 and 30 for Part 1, and Slides, pages 30, 31, 32, and 33 contain the algorithm, but pages 26 and 27 are more appropriate for this homework. There is no need to implement the delta thing.

📗 To speed up for loops in Python, see: Link for vectorization.

📗 The main purpose of the programming homework is to practice implementing mathematical algorithm given the formulas. Using packages and libraries to preprocess and read the data is okay, but you should not use packages and libraries for logistic regression and neural network.

📗 I recorded a video talking about P1 and how it is graded: Link. In case you are curious, I explained how the auto-grading scripts (JavaScript) grade P1 and M1. You do not have to watch it to solve P1.

📗 You can also look at last year's P1 and P2 hints: P1, P2. The questions and requirements are different so use it with caution.

📗 A sample solution in Java and Python is posted below.

Important notes:
(1) ReLU activation is used in the neural network solution, you need to change the formula to logistic activations and you need to change the gradient descent formula too! For the logistic regression, the activation functions and gradient descent steps are correct.
(2) You need to adjust the learning rate according to the training set you are given! Not all learning rates work for all problems, especially for neural networks.
(3) You need to figure out which variables to output yourself. The outputs from the solution are use for debugging purposes only.
(4) You are allowed to copy and use parts of the TA's solution without attribution. You are allowed to use code from other people and from the Internet, but you must state in the comments clearly where they come from!

Java and Python code by Hugh Liu: Link.

Last Updated: July 14, 2024 at 8:37 PM