# Midterm M1B

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved. You can save and load your answers (only fill-in-the-blank questions) using the buttons at the bottom of the page.

# Warning: please enter your ID before you start!


# Question 1



# Question 2



# Question 3



# Question 4



# Question 5



# Question 6



# Question 7



# Question 8



# Question 9



# Question 10



📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the cross entropy cost function \(C\) = . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one (stochastic) gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .
Hint The derivative of the cost function with respect to the weights given one training data point \(i\) can be computed as \(\dfrac{\partial C}{\partial w_{j}} = \dfrac{\partial C}{\partial a_{i}} \dfrac{\partial a_{i}}{\partial w_{j}}\), where \(\dfrac{\partial C}{\partial a_{i}}\) depends on the function given in the question and \(\dfrac{\partial a_{i}}{\partial w_{j}}\) is \(x_{i j}\) since the activation function is linear. The updated weight \(j\) can be found using the gradient descent formula \(w_{j} - \alpha \dfrac{\partial C}{\partial w_{j}}\). The derivative and update for \(b\) can be computed similarly.
📗 Answer (comma separated vector): .
📗 [2 points] In a three-layer (fully connected) neural network, the first layer contains sigmoid units, the second layer contains units, and the output layer contains units. The input is dimensional. How many weights plus biases does this neural network have? Enter one number.

📗 The above is a diagram of the network, the nodes labelled "1" are the bias units.
Hint See Fall 2019 Final Q14, Fall 2013 Final Q8, Fall 2006 Final Q17, Fall 2005 Final Q17. Three-layer neural networks have one input layer (same number of units as the input dimension), two hidden layers, and one output layer (usually the same number of units as the number of classes (labels), but in case there are only two classes, the number of units can be 1). We are using the convention of calling neural networks with four layers "three-layer neural networks" because there are only three layers with weights and biases (so we don't count the input layer). The number of weights between two consecutive layers (\(m_{1}\) units in the previous layer, \(m_{2}\) units in the next layer) is \(m_{1} \cdot m_{2}\), and the number of biases is \(m_{2}\).
📗 Answer: .
📗 [4 points] Given the number of instances in each class summarized in the following table, how many instances are used to train an one-vs-one SVM (Support Vector Machine) for class vs ?
\(y_{i}\) 0 1 2 3 4
Count

Hint
📗 Answer: .
📗 [3 points] A bag contains \(n\) = different colored balls. Randomly draw a ball from the bag with equal probability. What is the entropy of the outcome? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).
Hint See Fall 2014 Midterm Q10. The entropy formula is \(H = -\displaystyle\sum_{i=1}^{n} p_{i} \log_{2}\left(p_{i}\right)\). Here, since the probability of drawing each of the \(n\) balls is the same, \(p_{i} = \dfrac{1}{n}\) for each \(i\).
📗 Answer: .
📗 [4 points] Say we have a training set consisting of positive examples and negative examples where each example is a point in a two-dimensional, real-valued feature space. What will the classification accuracy be on the training set with NN (Nearest Neighbor).
Hint See Spring 2017 Midterm Q6, Fall 2014 Final Q19.
📗 Answer: .
📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias
\(x_{i1}\) \(x_{i2}\) \(y_{i}\) or \(o_{1}\)
0 0 ?
0 1 ?
1 0 ?
1 1 ?


Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.
Hint See Fall 2010 Final Q17. First compute the hidden layer units: \(h_{j} = 1_{\left\{w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} \geq 0\right\}}\), then compute the outputs (which are equal to the training data labels): \(y = o_{1} = 1_{\left\{w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b \geq 0\right\}}\). Repeat the computations for \(\left(x_{1}, x_{2}\right) = \left(0, 0\right), \left(0, 1\right), \left(1, 0\right), \left(1, 1\right)\).
📗 Answer (comma separated vector): .
📗 [3 points] There are two biased coins in my pocket: coin A has \(\mathbb{P}\left\{H | A\right\}\) = , coin B has \(\mathbb{P}\left\{H | B\right\}\) = . I took out a coin from the pocket at random with probability of A is . I flipped it twice the outcome is . What is the probability that the coin was ?
Hint See Spring 2018 Final Q22 Q23, Fall 2018 Midterm Q11, Fall 2017 Final Q20, Spring 2017 Final Q6, Fall 2010 Final Q18. For example, the Bayes Rule for the probability that the document is \(A\) given the outcome is \(H T H\) is \(\mathbb{P}\left\{A | H T H\right\} = \dfrac{\mathbb{P}\left\{H T H, A\right\}}{\mathbb{P}\left\{H T H\right\}}\) = \(\dfrac{\mathbb{P}\left\{H T H | A\right\} \mathbb{P}\left\{A\right\}}{\mathbb{P}\left\{H T H | A\right\} \mathbb{P}\left\{A\right\} + \mathbb{P}\left\{H T H | B\right\} \mathbb{P}\left\{B\right\}}\) = \(\dfrac{\mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{T | A\right\} \mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{A\right\}}{\mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{T | A\right\} \mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{A\right\} + \mathbb{P}\left\{H | B\right\} \mathbb{P}\left\{T | B\right\} \mathbb{P}\left\{H | B\right\} \mathbb{P}\left\{B\right\}}\). Note that \(\mathbb{P}\left\{H T H | A\right\}\) can be split into three probabilities because the coins are independently flipped.
📗 Answer: .
📗 [4 points] Given the counts, find the maximum likelihood estimate of \(\mathbb{P}\left\{A = 1|B + C = s\right\}\), for \(s\) = .
A B C counts
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1

Hint
📗 Answer: .
📗 [3 points] Consider the following directed graphical model over binary variables: \(A \leftarrow B \to  C\). Given the CPTs (Conditional Probability Table):
Variable Probability Variable Probability
\(\mathbb{P}\left\{B = 1\right\}\)
\(\mathbb{P}\left\{C = 1 | B = 1\right\}\) \(\mathbb{P}\left\{C = 1 | B = 0\right\}\)
\(\mathbb{P}\left\{A = 1 | B = 1\right\}\) \(\mathbb{P}\left\{A = 1 | B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?
Hint See Fall 2019 Final Q22 Q23 Q24 Q25, Spring 2018 Final Q24 Q25, Fall 2014 Final Q9, Fall 2006 Final Q20, Fall 2005 Final Q20. For any Bayes net, the joint probability can always be computed as the product of the conditional probabilities (conditioned on the parent node variable). For a causal chain \(A \to  B \to  C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a\right\} \mathbb{P}\left\{B = b | A = a\right\} \mathbb{P}\left\{C = c | B = b\right\}\). For a common cause \(A \leftarrow B \to  C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a | B = b\right\} \mathbb{P}\left\{B = b\right\} \mathbb{P}\left\{C = c | B = b\right\}\). For a common effect \(A \to  B \leftarrow C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a\right\} \mathbb{P}\left\{B = b | A = a, C = c\right\} \mathbb{P}\left\{C = c\right\}\).
📗 Answer: .

# Grade


 ***** ***** ***** ***** ***** 


 ***** ***** ***** ***** ***** 
 

# Warning: remember to submit this on Canvas!


📗 Please copy and paste the text between the *****s (not including the *****s) and submit it on Canvas, M1B.
📗 Please save a copy as text file using the button or just copy and paste it into a text file.
📗 You could load your answers using the button from the text field:







Last Updated: June 25, 2021 at 3:39 AM