# Midterm M1A

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved. You can save and load your answers (only fill-in-the-blank questions) using the buttons at the bottom of the page.

# Warning: please enter your ID before you start!


# Question 1



# Question 2



# Question 3



# Question 4



# Question 5



# Question 6



# Question 7



# Question 8



# Question 9



# Question 10



📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights \(w\) = and bias \(b\) = trained using the Perceptron Algorithm. Given a new input \(x\) = and \(y\) = . Let the learning rate be \(\alpha\) = , compute the updated weights, \(w', b'\) = :
Hint See Spring 2018 Final Q7, Spring 2017 Final Q3. The perceptron learning formula using the notations in this question is: \(w' = w - \alpha \left(a - y\right) x\) and \(b' = b - \alpha \left(a - y\right)\) where \(a = 1_{\left\{w^\top x + b \geq 0\right\}}\). Note that this is not a gradient descent procedure: it just happens to use a similar formula.
📗 Answer (comma separated vector): .
📗 [2 points] In a three-layer (fully connected) neural network, the first hidden layer contains sigmoid units, the second hidden layer contains units, and the output layer contains units. The input is dimensional. How many weights plus biases does this neural network have? Enter one number.

📗 The above is a diagram of the network, the nodes labelled "1" are the bias units.
Hint See Fall 2019 Final Q14, Fall 2013 Final Q8, Fall 2006 Final Q17, Fall 2005 Final Q17. Three-layer neural networks have one input layer (same number of units as the input dimension), two hidden layers, and one output layer (usually the same number of units as the number of classes (labels), but in case there are only two classes, the number of units can be 1). We are using the convention of calling neural networks with four layers "three-layer neural networks" because there are only three layers with weights and biases (so we don't count the input layer). The number of weights between two consecutive layers (\(m_{1}\) units in the previous layer, \(m_{2}\) units in the next layer) is \(m_{1} \cdot m_{2}\), and the number of biases is \(m_{2}\).
📗 Answer: .
📗 [2 points] Let \(w\) = and \(b\) = . For the point \(x\) = , \(y\) = , what is the smallest slack value \(\xi\) for it to satisfy the margin constraint?
Hint See Fall 2011 Midterm Q8, Fall 2009 Final Q1. There are two inequality constraints for the slack variable: (1) \(\left(2 y - 1\right)\left(w^\top x + b\right) \geq 1 - \xi\) and (2) \(\xi \geq 0\). Combine the two inequalities to get the smallest slack variable value.
📗 Answer: .
📗 [3 points] Statistically, cats are often hungry around 6:00 am (I am making this up). At that time, a cat is hungry of the time (C = 1), and not hungry of the time (C = 0). What is the entropy of the binary random variable C? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).
Hint See Fall 2014 Midterm Q10, Fall 2006 Final Q11, Fall 2005 Final Q11. The entropy formula is \(H = -p_{1} \log_{2}\left(p_{1}\right) - p_{2} \log_{2}\left(p_{2}\right)\).
📗 Answer: .
📗 [4 points] You have a data set with positive items and negative items. You perform a "leave-one-out" procedure: for each item i, learn a separate kNN (k Nearest Neighbor) classifier on all items except item i, and compute that kNN's accuracy in predicting item i. The leave-one-out accuracy is defined to be the average of the accuracy for each item. What is the leave-one-out accuracy when k = ?
Hint See Fall 2011 Final Q20.
📗 Answer: .
📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias
\(x_{i1}\) \(x_{i2}\) \(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0 0 ?
0 1 ?
1 0 ?
1 1 ?


Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.
Hint See Fall 2010 Final Q17. First compute the hidden layer units: \(h_{j} = 1_{\left\{w^{\left(1\right)}_{1j} x_{1} + w^{\left(1\right)}_{2j} x_{2} + b_{j} \geq 0\right\}}\), then compute the outputs (which are equal to the training data labels): \(y = o_{1} = 1_{\left\{w^{\left(2\right)}_{1} h_{1} + w^{\left(2\right)}_{2} h_{2} + b \geq 0\right\}}\). Repeat the computations for \(\left(x_{1}, x_{2}\right) = \left(0, 0\right), \left(0, 1\right), \left(1, 0\right), \left(1, 1\right)\).
📗 Answer (comma separated vector): .
📗 [3 points] There are two biased coins in my pocket: coin A has \(\mathbb{P}\left\{H | A\right\}\) = , coin B has \(\mathbb{P}\left\{H | B\right\}\) = . I took out a coin from the pocket at random with probability of A is . I flipped it twice the outcome is . What is the probability that the coin was ?
Hint See Spring 2018 Final Q22 Q23, Fall 2018 Midterm Q11, Fall 2017 Final Q20, Spring 2017 Final Q6, Fall 2010 Final Q18. For example, the Bayes Rule for the probability that the document is \(A\) given the outcome is \(H T H\) is \(\mathbb{P}\left\{A | H T H\right\} = \dfrac{\mathbb{P}\left\{H T H, A\right\}}{\mathbb{P}\left\{H T H\right\}}\) = \(\dfrac{\mathbb{P}\left\{H T H | A\right\} \mathbb{P}\left\{A\right\}}{\mathbb{P}\left\{H T H | A\right\} \mathbb{P}\left\{A\right\} + \mathbb{P}\left\{H T H | B\right\} \mathbb{P}\left\{B\right\}}\) = \(\dfrac{\mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{T | A\right\} \mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{A\right\}}{\mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{T | A\right\} \mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{A\right\} + \mathbb{P}\left\{H | B\right\} \mathbb{P}\left\{T | B\right\} \mathbb{P}\left\{H | B\right\} \mathbb{P}\left\{B\right\}}\). Note that \(\mathbb{P}\left\{H T H | A\right\}\) can be split into three probabilities because the coins are independently flipped.
📗 Answer: .
📗 [2 points] You have a vocabulary with word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{\text{tenet}}\) = . Using add-one smoothing \(\delta\) = (Laplace smoothing), compute \(p_{\text{tenet}}\).
Hint See Fall 2018 Midterm Q12. The maximum likelihood estimate of \(p_{w}\) is \(\dfrac{c_{w} + \delta}{\displaystyle\sum_{w'} c_{w'} + n \delta}\).
📗 Answer: .
📗 [3 points] Consider the following directed graphical model over binary variables: \(A \to  B \leftarrow C\). Given the CPTs (Conditional Probability Table):
Variable Probability Variable Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 | A = C = 1\right\}\) \(\mathbb{P}\left\{B = 1 | A = 0, C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 | A = 1, C = 0\right\}\) \(\mathbb{P}\left\{B = 1 | A = C = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?
Hint See Fall 2019 Final Q22 Q23 Q24 Q25, Spring 2018 Final Q24 Q25, Fall 2014 Final Q9, Fall 2006 Final Q20, Fall 2005 Final Q20. For any Bayes net, the joint probability can always be computed as the product of the conditional probabilities (conditioned on the parent node variable). For a causal chain \(A \to  B \to  C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a\right\} \mathbb{P}\left\{B = b | A = a\right\} \mathbb{P}\left\{C = c | B = b\right\}\). For a common cause \(A \leftarrow B \to  C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a | B = b\right\} \mathbb{P}\left\{B = b\right\} \mathbb{P}\left\{C = c | B = b\right\}\). For a common effect \(A \to  B \leftarrow C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a\right\} \mathbb{P}\left\{B = b | A = a, C = c\right\} \mathbb{P}\left\{C = c\right\}\).
📗 Answer: .
📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.
📗 Answer: .

# Grade


 * * * *


 * * * *
 

# Warning: remember to submit this on Canvas!


📗 Please copy and paste the text between the *s (not including the *s) and submit it on Canvas, M1A.
📗 Please save a copy as text file using the button or just copy and paste it into a text file.
📗 You could load your answers using the button from the text field:







Last Updated: July 14, 2024 at 8:38 PM