Young Wu's Homepage

# M1A Midterm Part 1

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 In case the questions are not generated correctly, try (1) refresh the page, (2) clear the browser cache, Ctrl+F5 or Ctrl+Shift+R or Shift+Command+R, (3) switch to incognito/private browsing mode, (4) switch to another browser, (5) use a different ID. If none of these work, message me on Zoom.

📗 Join Zoom if you have questions: Zoom Link

📗 Please do not refresh the page (after you start): your answers will not be saved.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Suppose the squared loss is used to do stochastic gradient descent for logistic regression, i.e. \(C = \dfrac{1}{2} \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\) where \(a_{i} = \dfrac{1}{1 + e^{- w x_{i} - b}}\). Given the current weight \(w\) = and bias \(b\) = , with \(x_{i}\) = , \(y_{i}\) = , \(a_{i}\) = (no need to recompute this value), with learning rate \(\alpha\) = . What is the updated after the iteration? Enter a single number.

📗 Answer: .

📗 [3 points] Suppose the likelihood probabilities of observing "a", "o", "c" in a real movie script is , and the likelihood probabilities of observing "a", "o", "c" in a fake movie script is . Given the prior probabilities, of the scripts are real. How would a Naive Bayes classifier classify a script ""? Enter \(1\) if it is classified as real, enter \(-1\) if it is classified as fake, and enter \(0\) if it's a tie (equally likely to be real and fake).

📗 Answer: .

📗 [3 points] In one iteration of the Perceptron Algorithm, \(x\) = , \(y\) = , and predicted label \(\hat{y} = a\) = . The learning rate \(\alpha = 1\). After the iteration, how many of the weights (include bias \(b\)) are increased (the change is strictly larger than 0). If it is impossible to figure out given the information, enter -1.

📗 Answer: .

📗 [3 points] Suppose you are given a neural network with hidden layers, input units, output units, and hidden units. In one backpropogation step when computing the gradient of the cost (for example, squared loss) with respect to \(w^{\left(1\right)}_{11}\), the weight in layer \(1\) connecting input \(1\) and hidden unit \(1\), how many weights (including \(w^{\left(1\right)}_{11}\) itself, and including biases) are used in the backpropogation step of \(\dfrac{\partial C}{\partial w^{\left(1\right)}_{11}}\)?

📗 The above is a diagram of the network, the nodes labelled "1" are the bias units. You can highlight the edges representing the weights in the diagram, but they are not graded. Note: the backpropogation step assumes the activations in all layers are already known so do not count the weights and biases in the forward step computing the activations.

📗 Answer: .

📗 [3 points] A hard margin SVM (Support Vector Machine) is trained on the following dataset. Suppose we restrict \(b\) = , what is the value of \(w\)? Enter a single number, i.e. do not include \(b\). Assume the SVM classifier is \(1_{\left\{w x + b \geq 0\right\}}\) (this means it predict 1 if \(w x + b \geq 0\) and 0 otherwise.

\(x_{i}\)
\(y_{i}\)

📗 Answer: .

📗 [3 points] Consider the Grid World with terminal states "RED" and "GREEN" and 7 other states shown in the table below.

RED	1	2
3	4	5
6	7	GREEN

There are four actions UP, DOWN, LEFT, RIGHT describing the movement between the states on the grid. The grid does not wrap around, i.e. using the action UP in state 1 results in state 1, not state 7.
Suppose the reward on all transitions (from actions UP, DOWN, LEFT, RIGHT) are \(R_{t}\) = , and the discount factor is \(\gamma\) = . The current policy \(\pi\) (probabilities of actions UP, DOWN, LEFT, RIGHT when in each state) is given in the following table.

State	UP	DOWN	LEFT	RIGHT
1
2
3
4
5
6
7

The current value function \(V_{k}\) is given in the table below.

\(0\)

		\(0\)

Find the value of state in the next step of value iteration (i.e. \(V_{k+1}\) for state ). Enter one number.

📗 Answer: .

📗 [4 points] Given the following training data, what is the fold cross validation accuracy (i.e. LOOCV, Leave One Out Cross Validation) if NN (Nearest Neighbor) classifier with Manhattan distance is used. Break the tie (in distance) by using the instance with the smaller index. Enter a number between 0 and 1.

Index	1	2	3	4	5
\(x_{i}\)
\(y_{i}\)

📗 Answer: .

📗 [4 points] In a convolutional neural network, suppose the activation map of a convolution layer is . What is the activation map after a non-overlapping (stride 2) 2 by 2 max-pooling layer?

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [3 points] Given two Boolean random variables, \(A\) and \(B\), where \(\mathbb{P}\left\{A\right\}\) = , \(\mathbb{P}\left\{B\right\}\) = , and \(\mathbb{P}\left\{A| \neg B\right\}\) = , what is \(\mathbb{P}\left\{A|B\right\}\)?

📗 Answer: .

📗 [3 points] We use gradient descent to find the minimum of the function \(f\left(x\right)\) = with step size \(\eta > 0\). If we start from the point \(x_{0}\) = , how small should \(\eta\) be so we make progress in the first iteration? Enter the largest number of \(\eta\) below which we make progress. For example, if we make progress when \(\eta < 0.01\), enter \(0.01\).

📗 Answer: .

📗 [4 points] Given the following training set, add one item \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) with \(y\) = so that all 7 items are support vectors for the Hard Margin SVM (Support Vector Machine) trained on the new training set.

\(x_{1}\)	\(x_{2}\)	\(y\)
		0
		0
		0
		1
		1
		1

📗 Answer (comma separated vector): .

📗 [3 points] An UFO is hiding in a cloud near Haywood Ranch. On given day, the UFO hides in the cloud of the time (C = 0), and comes out of the cloud of the time (C = 1). What is the entropy of the binary random variable C? Reminder that log based 2 of x can be found by log(x) / log(2).

📗 Answer: .

📗 [4 points] Given the following transition matrix for a bigram model with words "Eat", "My" and "Hammer": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

📗 Answer: .

📗 [2 points] In a corpus with word tokens, the phrase "Home Lander" appeared times (not Homelander). In particular, "Home" appeared times and "Lander" appeared . If we estimate probability by frequency (the maximum likelihood estimate) without smoothing, what is the estimated probability of \(\mathbb{P}\){Lander | Home}?

📗 Answer: .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it through email. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 1" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:47 AM