Prev: M2 Next: M4
Back to week 3 page: Link


# M3 Written (Math) Problems

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key)
📗 The official deadline is June 13, late submissions within a week will be accepted without penalty, but please submit a regrade request form: Link.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end. 
📗 Please do not refresh the page: your answers will not be saved.
📗 Please report any bugs on Piazza: Link

# Warning: please enter your ID before you start!


# Question 1



# Question 2



# Question 3



# Question 4



# Question 5



# Question 6



# Question 7



# Question 8



# Question 9



# Question 10



# Question 11



📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias
\(x_{i1}\) \(x_{i2}\) \(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0 0 ?
0 1 ?
1 0 ?
1 1 ?


Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.
Hint See Fall 2010 Final Q17. First compute the hidden layer units: \(h_{j} = 1_{\left\{w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} \geq 0\right\}}\), then compute the outputs (which are equal to the training data labels): \(y = o_{1} = 1_{\left\{w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b \geq 0\right\}}\). Repeat the computations for \(\left(x_{1}, x_{2}\right) = \left(0, 0\right), \left(0, 1\right), \left(1, 0\right), \left(1, 1\right)\).
📗 Answer (comma separated vector): .
📗 [4 points] Given the following neural network that classifies all the training items correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias
\(x_{i1}\) \(x_{i2}\) \(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0 0 ?
0 1 ?
1 0 ?
1 1 ?


Hint See Fall 2010 Final Q17. First compute the hidden layer units: \(h_{j} = 1_{\left\{w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} \geq 0\right\}}\), then compute the outputs (which are equal to the training data labels): \(y = o_{1} = 1_{\left\{w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b \geq 0\right\}}\). Repeat the computations for \(\left(x_{1}, x_{2}\right) = \left(0, 0\right), \left(0, 1\right), \left(1, 0\right), \left(1, 1\right)\).
📗 Answer (comma separated vector): .
📗 [4 points] Fill in the missing weight below so that it computes the following function. All inputs takes value 0 or 1, and the perceptrons are linear threshold units. The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias .
\(x_{1}\) \(x_{2}\) \(y\) or \(a^{\left(2\right)}_{1}\)
0 0
0 1
1 0
1 1


📗 Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.
Hint See Fall 2010 Final Q17. There are many possible answers: the weights should not be computed using gradient descent because all other weights are fixed. The one approach is to first figure out the hidden unit values (either 0 or 1 in this case) using the given weights. Then solve the inequality: if a hidden unit \(j\) is 0, \(w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} < 0\), and if the hidden unit is 1, \(w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} \geq 0\); or if output is 0, \(w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b < 0\), and if the output is 1, \(w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b \geq 0\).
📗 Answer: .
📗 [4 points] Fill in the missing weight below so that it computes the following function. All inputs takes value 0 or 1, and the perceptrons are linear threshold units. The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias .
\(x_{1}\) \(x_{2}\) \(y\) or \(a^{\left(2\right)}_{1}\)
0 0
0 1
1 0
1 1


📗 Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.
Hint See Fall 2010 Final Q17. There are many possible answers: the weights should not be computed using gradient descent because all other weights are fixed. The one approach is to first figure out the hidden unit values (either 0 or 1 in this case) using the given weights. Then solve the inequality: if a hidden unit \(j\) is 0, \(w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} < 0\), and if the hidden unit is 1, \(w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} \geq 0\); or if output is 0, \(w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b < 0\), and if the output is 1, \(w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b \geq 0\).
📗 Answer: .
📗 [2 points] Let the input \(x \in \mathbb{R}\). Thus the input layer has a single \(x\) input. The network has 5 hidden layers. Each hidden layer has 10 units. The output layer has a single unit and outputs \(y \in \mathbb{R}\). Between layers, the network is fully connected. All units in the network have a bias input. All units are linear units, namely the activation function is the identity function \(a = g\left(z\right) = z\), while \(z = w^\top x + b\) is a linear combination of all inputs to that unit (including the bias). Which functions can this network compute?
Hint See Fall 2017 Final Q19, Spring 2017 Final Q4. Combination of linear units can still only compute linear functions. We need non-linear activation functions in order for neural networks to approximate any continuous function.
📗 Choices:





None of the above
📗 [2 points] In a three-layer (fully connected) neural network, the first hidden layer contains sigmoid units, the second hidden layer contains units, and the output layer contains units. The input is dimensional. How many weights plus biases does this neural network have? Enter one number.

📗 The above is a diagram of the network, the nodes labelled "1" are the bias units.
Hint See Fall 2019 Final Q14, Fall 2013 Final Q8, Fall 2006 Final Q17, Fall 2005 Final Q17. Three-layer neural networks have one input layer (same number of units as the input dimension), two hidden layers, and one output layer (usually the same number of units as the number of classes (labels), but in case there are only two classes, the number of units can be 1). We are using the convention of calling neural networks with four layers "three-layer neural networks" because there are only three layers with weights and biases (so we don't count the input layer). The number of weights between two consecutive layers (\(m_{1}\) units in the previous layer, \(m_{2}\) units in the next layer) is \(m_{1} \cdot m_{2}\), and the number of biases is \(m_{2}\).
📗 Answer: .
📗 [3 points] The sigmoid function in a neural network is defined as \(g\left(x\right) = \dfrac{1}{1 + e^{-x}}\). There is an another activation function defined as \(h\left(x\right)\) = . If \(h\left(x\right) = a \cdot g\left(b \cdot x\right) + c\), write down the values of \(a, b, c\) (constants, they cannot be functions of \(x\)). In the diagram, the green line is \(h\left(x\right)\) and the red line is \(a \cdot g\left(b \cdot x\right) + c\) with the \(a, b, c\) you selected.
Hint See Fall 2017 Final Q23. Some relations that may be useful: \(1 - \dfrac{1}{1 + e^{-x}} = \dfrac{e^{-x}}{1 + e^{-x}}\) and \(\dfrac{e^{x}}{e^{x} + e^{-x}} = \dfrac{1}{1 + e^{-2 x}}\).

📗 Answers:
\(a\) = 0
\(b\) = 0
\(c\) = 0

📗 [1 points] You want to design a neural network with sigmoid units to predict the academic role from his webpage. Possible roles are "professor" (label 0), "student" (label 1), "staff" (label 2). Suppose each person can take on only one of these roles at the same time. The neural network uses one-hot encoding, label 0 is encoded by \(\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\), label 1 is encoded by \(\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\), and label 2 is encoded by \(\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\). What is the role (enter a label, not a string) if the output is ?
Hint See Fall 2011 Midterm Q12. It is the label corresponding to the largest output value.
📗 Answer: .
📗 [1 points] A binary classifier is trained on a training set, and the resulting classifier is: \(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c \geq 0\) and \(\hat{y} = 0\) otherwise, and tested its performance on a separate test set. The accuracy of the classifier is . What is accuracy if the flipped classifier (\(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c < 0\) and \(\hat{y} = 0\) otherwise) is used?
Hint See Fall 2017 Final Q21. Accuracy is the number of correct predictions divided by the total number of test instances.
📗 Enter a fraction to represent the accuracy, for example, enter 0.5 if the accuracy is 50 percent and enter 1 if the accuracy is 100 percent.
📗 Answer: .
📗 [2 points] A test set \(\left(x_{1}, y_{1}\right), ..., \left(x_{100}, y_{100}\right)\) contains labels \(y_{i}\) = for \(i = 1, ..., 100\). A classifier simply predicts all the time (the labels are +1 and -1). What is this classifier's test accuracy?
Hint See Fall 2014 Final Q4, Fall 2010 Final Q1. Write down the first few labels say \(i = 1, 2, 3, 4\) to see the pattern.
📗 Enter a fraction to represent the accuracy, for example, enter 0.5 if the accuracy is 50 percent and enter 1 if the accuracy is 100 percent.
📗 Answer: .
📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the questions that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.
📗 Answer: .

# Grade


 * * * *

 * * * * *

# Submission


📗 Please do not modify the content in the above text field: use the "Grade" button to update.


📗 Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M3. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 3" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.




# Solutions

📗 Some of the past exams referenced in the Hints can be found on Professor Zhu, Professor Liang and Professor Dyer's websites: Link, and Link.
📗 Some of the questions are from last year, and I recorded videos going through them, the links are at the bottom of the Week 1 to Week 14 pages, for example: W8 and W14.
📗 The links to the solutions the students volunteered to share on Piazza will be collected in this post around the official due date: Link.





Last Updated: January 16, 2025 at 6:07 PM