Young Wu's Homepage

# M2A Midterm Part 2 Version A

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 In case the questions are not generated correctly, try (1) refresh the page, (2) clear the browser cache, (3) switch to incognito/private browsing mode, (4) switch to another browser, (5) use a different ID. If none of these work, please post a private message on Piazza with your ID.

📗 Join Zoom if you have questions:

Zoom Link

📗 Please do not refresh the page (after you start): your answers will not be saved.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias

\(x_{i1}\)	\(x_{i2}\)	\(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0	0	?
0	1	?
1	0	?
1	1	?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer (comma separated vector): .

📗 [1 points] A binary classifier is trained on a training set, and the resulting classifier is: \(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c \geq 0\) and \(\hat{y} = 0\) otherwise, and tested its performance on a separate test set. The accuracy of the classifier is . What is accuracy if the flipped classifier (\(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c < 0\) and \(\hat{y} = 0\) otherwise) is used?

📗 Enter a fraction to represent the accuracy, for example, enter 0.5 if the accuracy is 50 percent and enter 1 if the accuracy is 100 percent.

📗 Answer: .

📗 [4 points] If \(K\left(x, x'\right)\) is a kernel with induced feature representation \(\varphi\left(x_{0}\right)\) = , and \(G\left(x, x'\right)\) is another kernel with induced feature representation \(\theta\left(x_{0}\right)\) = , then it is known that \(H\left(x, x'\right) = a K\left(x, x'\right) + b G\left(x, x'\right)\), \(a\) = , \(b\) = is also a kernel. What is the induced feature representation of \(H\) for this \(x_{0}\)?

📗 Answer (comma separated vector): .

📗 [2 points] Let \(w\) = and \(b\) = . For the point \(x\) = , \(y\) = , what is the smallest slack value \(\xi\) for it to satisfy the margin constraint?

📗 Answer: .

📗 [3 points] Consider the following directed graphical model over binary variables: \(A \to B \to C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = 1\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = 0\right\}\)
\(\mathbb{P}\left\{C = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{C = 1 \| B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?

📗 Answer: .

📗 [2 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\) with the following training set.

A	B	C
0		0
0		0
0		1
0		1
1		0
1		0
1		1
1		1

What is the MLE (Maximum Likelihood Estimate) with Laplace smoothing of the conditional probability that \(\mathbb{P}\){ \(B\) = | \(A\) = , \(C\) = }?

📗 Answer: .

📗 [2 points] What are the smallest and largest values of subderivatives of at \(x = 0\).

📗 Answer (comma separated vector): .

📗 [4 points] Given the following transition matrix for a bigram model with words "", "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

📗 Answer: .

📗 [4 points] A convolutional neural network has input image of size x that is connected to a convolutional layer that uses a x filter, zero padding of the image, and a stride of 1. There are activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses x max pooling, a stride of (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.

📗 Answer: .

📗 [4 points] Consider the linear SVM (Support Vector Machine) problem without slack variables or kernels: this is known as the hard margin SVM. If you give it a linearly separable training data set where \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \in \mathbb{R}^{2}\) and \(y \in \left\{0, 1\right\}\), it will learn a line in \(\mathbb{R}^{2}\). Tom did something to your data set, and hard margin SVM no longer works (no longer linearly separable) on the modified data set: \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \leftarrow \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} + \begin{bmatrix} b_{1} \\ b_{2} \end{bmatrix} = M \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} + b\). Suppose \(b\) = , give an example of \(M\)?

📗 Note: you can test your transformation using : the original points are on the left and the new points after the transformation are on the right.

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [2 points] You have a dataset with unique data points (half of which are labeled 0 and the other half labeled 1) which you want to use to train a kNN (k Nearest Neighbor) classifier. You setup the experiment as follows: you train kNN classifiers: \(k\) = using all the data points. Then you randomly select data points from the training set, and classify them using each of the classifiers. Which classifier (enter the \(k\) value) will have the highest accuracy? Your answer should not depend on which random subset is selected.

📗 Answer: .

📗 [3 points] Consider a -dimensional feature space where each feature takes integer value from 0 to (including 0 and ). What is the smallest and largest distance between the two distinct (non-overlapping) points in the feature space?

📗 Answer (comma separated vector): .

📗 [5 points] Andy is a three-month old baby. He can be happy (state 0), hungry (state 1), or having a wet diaper (state 2). Initially when he wakes up from his nap at 1pm, he is happy. If he is happy, there is a chance that he will remain happy one hour later, a chance to be hungry by then, and a chance to have a wet diaper. Similarly, if he is hungry, one hour later he will be happy with chance, hungry with chance, and wet diaper with chance. If he has a wet diaper, one hour later he will be happy with chance, hungry with chance, and wet diaper with chance. He can smile (observation 0) or cry (observation 1). When he is happy, he smiles of the time and cries of the time; when he is hungry, he smiles of the time and cries of the time; when he has a wet diaper, he smiles of the time and cries of the time.

What is the probability that the particular observed sequence (or \(Y_{1}, Y_{2}\) = ) happens (in the first two periods)?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer: .

📗 [4 points] There are 3 states \(s_{0}, s_{1}, s_{2}\) and 3 actions \(a_{0}, a_{1}, a_{2}\). We start from , choose , we get the reward and then move to , choose . Update the Q value for (, ) based on the current Q table and the movement above, using SARSA and Q-learning (enter two numbers, comma separated)? The reward decay (discount rate) is \(\gamma\) = , and the step size (learning rate) is \(\alpha\) = .

State \ Action	\(a_{0}\)	\(a_{1}\)	\(a_{2}\)
\(s_{0}\)
\(s_{1}\)
\(s_{2}\)

📗 Answer (comma separated vector): .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M2A. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 2A" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:47 AM