Young Wu's Homepage

# M2B Midterm Part 2 Version B

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 In case the questions are not generated correctly, try (1) refresh the page, (2) clear the browser cache, (3) switch to incognito/private browsing mode, (4) switch to another browser, (5) use a different ID. If none of these work, please post a private message on Piazza with your ID.

📗 Join Zoom if you have questions:

Zoom Link

📗 Please do not refresh the page (after you start): your answers will not be saved.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias

\(x_{i1}\)	\(x_{i2}\)	\(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0	0	?
0	1	?
1	0	?
1	1	?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer (comma separated vector): .

📗 [2 points] A test set \(\left(x_{1}, y_{1}\right), ..., \left(x_{100}, y_{100}\right)\) contains labels \(y_{i}\) = for \(i = 1, ..., 100\). A classifier simply predicts all the time (the labels are +1 and -1). What is this classifier's test accuracy?

📗 Enter a fraction to represent the accuracy, for example, enter 0.5 if the accuracy is 50 percent and enter 1 if the accuracy is 100 percent.

📗 Answer: .

📗 [4 points] Consider a kernel \(K\left(x_{i_{1}}, x_{i_{2}}\right)\) = + + , where both \(x_{i_{1}}\) and \(x_{i_{2}}\) are 1D positive real numbers. What is the feature vector \(\varphi\left(x_{i}\right)\) induced by this kernel evaluated at \(x_{i}\) = ?

📗 Answer (comma separated vector): .

📗 [2 points] Let \(w\) = and \(b\) = . For the point \(x\) = , \(y\) = , what is the smallest slack value \(\xi\) for it to satisfy the margin constraint?

📗 Answer: .

📗 [3 points] Consider the following directed graphical model over binary variables: \(A \leftarrow B \to C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{B = 1\right\}\)
\(\mathbb{P}\left\{C = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{C = 1 \| B = 0\right\}\)
\(\mathbb{P}\left\{A = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{A = 1 \| B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?

📗 Answer: .

📗 [2 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\) with the following training set.

A	B	C
0		0
0		0
0		1
0		1
1		0
1		0
1		1
1		1

What is the MLE (Maximum Likelihood Estimate) with Laplace smoothing of the conditional probability that \(\mathbb{P}\){ \(B\) = | \(A\) = , \(C\) = }?

📗 Answer: .

📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the hinge cost function . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one stochastic (sub)gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .

📗 Answer (comma separated vector): .

📗 [3 points] Welcome to the Terrible-Three-Day-Tour! We will visit New York on Day 1. The rules for Day 2 and Day 3 are:

(a) If we were at New York the day before, with probability we will stay in New York, and with probability we will go to Baltimore.
(b) If we were at Baltimore the day before, with probability we will stay in Baltimore, and with probability we will go to Washington D.C.
On average, before you start the tour, what is your chance to visit (at least on one of the two days)?

📗 Answer: .

📗 [4 points] A convolutional neural network has input image of size x that is connected to a convolutional layer that uses a x filter, zero padding of the image, and a stride of 1. There are activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses x max pooling, a stride of (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.

📗 Answer: .

📗 [4 points] Consider the linear SVM (Support Vector Machine) problem without slack variables or kernels: this is known as the hard margin SVM. If you give it a linearly separable training data set where \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \in \mathbb{R}^{2}\) and \(y \in \left\{0, 1\right\}\), it will learn a line in \(\mathbb{R}^{2}\). Tom did something to your data set, and hard margin SVM no longer works (no longer linearly separable) on the modified data set: \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \leftarrow \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} + \begin{bmatrix} b_{1} \\ b_{2} \end{bmatrix} = M \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} + b\). Suppose \(b\) = , give an example of \(M\)?

📗 Note: you can test your transformation using : the original points are on the left and the new points after the transformation are on the right.

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [3 points] A hospital trains a decision tree to predict if any given patient has technophobia or not. The training set consists of patients. There are features. The labels are binary. The decision tree is not pruned. What are the smallest and largest possible training set accuracy of the decision tree? Enter two numbers between 0 and 1. Hint: patients with the same features may have different labels.

📗 Answer (comma separated vector): .

📗 [3 points] You have a joint probability table over \(k\) = random variables \(X_{1}, X_{2}, ..., X_{k}\), where each variable takes \(m\) = possible values: \(1, 2, ..., m\). To compute the probability that \(X_{1}\) = , how many cells in the table do you need to access (at most)?

📗 Answer: .

📗 [5 points] Andy is a three-month old baby. He can be happy (state 0), hungry (state 1), or having a wet diaper (state 2). Initially when he wakes up from his nap at 1pm, he is happy. If he is happy, there is a chance that he will remain happy one hour later, a chance to be hungry by then, and a chance to have a wet diaper. Similarly, if he is hungry, one hour later he will be happy with chance, hungry with chance, and wet diaper with chance. If he has a wet diaper, one hour later he will be happy with chance, hungry with chance, and wet diaper with chance. He can smile (observation 0) or cry (observation 1). When he is happy, he smiles of the time and cries of the time; when he is hungry, he smiles of the time and cries of the time; when he has a wet diaper, he smiles of the time and cries of the time.

What is the probability that the particular observed sequence (or \(Y_{1}, Y_{2}\) = ) happens (in the first two periods)?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer: .

📗 [4 points] There are 3 states \(s_{0}, s_{1}, s_{2}\) and 3 actions \(a_{0}, a_{1}, a_{2}\). We start from , choose , we get the reward and then move to , choose . Update the Q value for (, ) based on the current Q table and the movement above, using SARSA and Q-learning (enter two numbers, comma separated)? The reward decay (discount rate) is \(\gamma\) = , and the step size (learning rate) is \(\alpha\) = .

State \ Action	\(a_{0}\)	\(a_{1}\)	\(a_{2}\)
\(s_{0}\)
\(s_{1}\)
\(s_{2}\)

📗 Answer (comma separated vector): .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M2B. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 2B" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:47 AM