Young Wu's Homepage

# XM2 Exam Part 2 Version A

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please join Zoom for announcements: Link.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Given the training set below and find the label of the decision tree that achieves 100 percent accuracy. Enter \(\hat{y}_{1}, \hat{y}_{2}, \hat{y}_{3}, \hat{y}_{4}\) as a vector.

📗 The training set:

\(x_{1}\)	\(x_{2}\)	\(y\)
\(0\)	\(0\)
\(0\)	\(1\)
\(1\)	\(0\)
\(1\)	\(1\)

📗 The decision tree:

if \(x_{1} \leq 0.5\)	if \(x_{2} \leq 0.5\)	label \(\hat{y}_{1}\)
-	else \(x_{2} > 0.5\)	label \(\hat{y}_{2}\)
else \(x_{1} > 0.5\)	if \(x_{2} \leq 0.5\)	label \(\hat{y}_{3}\)
-	else \(x_{2} > 0.5\)	label \(\hat{y}_{4}\)

📗 Answer (comma separated vector): .

📗 [4 points] Given a neural network with 1 hidden layer with hidden units, suppose the current hidden layer weights are \(w^{\left(1\right)}\) = = , and the output layer weights are \(w^{\left(2\right)}\) = = . Given an instance (item) \(x\) = and \(y\) = , the activation values are \(a^{\left(1\right)}\) = = and \(a^{\left(2\right)}\) = . What is updated weight \(w^{\left(1\right)}_{21}\) after one step of stochastic gradient descent based on \(x\) with learning rate \(\alpha\) = ? The activation functions are all and the cost is square loss.

📗 Reminder: logistic activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = a_{i} \left(1 - a_{i}\right)\), tanh activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1 - a_{i}^{2}\), ReLU activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1_{\left\{a_{i} \geq 0\right\}}\), and square cost has gradient \(\dfrac{\partial C_{i}}{\partial a_{i}} = a_{i} - y_{i}\).

📗 Answer: .

📗 [4 points] Suppose the only three support vectors in a data set is with label and with label and \(x\) with label , let the margin (the distance between the plus and minus planes) be . What is \(x\)? If there are multiple possible values, enter one of them, if there are none, enter \(-1, -1\).

📗 Answer (comma separated vector): .

📗 [4 points] Given the two training points and and their labels \(0\) and \(1\). What is the kernel (Gram) matrix if the RBF (radial basis function) Gaussian kernel with \(\sigma\) = is used? Use the formula \(K_{i i'} = e^{- \dfrac{1}{2 \sigma^{2}} \left(x_{i} - x_{i'}\right)^\top \left(x_{i} - x_{i'}\right)}\).

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [4 points] Suppose the squared loss is used to do stochastic gradient descent for logistic regression, i.e. \(C = \dfrac{1}{2} \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\) where \(a_{i} = \dfrac{1}{1 + e^{- w x_{i} - b}}\). Given the current weight \(w\) = and bias \(b\) = , with \(x_{i}\) = , \(y_{i}\) = , \(a_{i}\) = (no need to recompute this value), with learning rate \(\alpha\) = . What is the updated after the iteration? Enter a single number.

📗 Answer: .

📗 [4 points] A convolutional neural network has input image of size x that is connected to a convolutional layer that uses a x filter, zero padding of the image, and a stride of 1. There are activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses x max pooling, a stride of (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.

📗 Answer: .

📗 [2 points] Let \(w\) = and \(b\) = . For the point \(x\) = , \(y\) = , what is the smallest slack value \(\xi\) for it to satisfy the margin constraint?

📗 Answer: .

📗 [4 points] Given a linear SVM (Support Vector Machine) that perfectly classifies a set of training data containing positive examples and negative examples. What is the maximum possible number of training examples that could be removed and still produce the exact same SVM as derived for the original training set?

📗 Answer: .

📗 [3 points] Consider a -dimensional feature space where each feature takes integer value from 0 to (including 0 and ). What is the smallest and largest distance between the two distinct (non-overlapping) points in the feature space?

📗 Answer (comma separated vector): .

📗 [2 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\) with the following training set.

A	B	C
0		0
0		0
0		1
0		1
1		0
1		0
1		1
1		1

What is the MLE (Maximum Likelihood Estimate) with Laplace smoothing of the conditional probability that \(\mathbb{P}\){ \(B\) = | \(A\) = , \(C\) = }?

📗 Answer: .

📗 [3 points] Statistically, cats are often hungry around 6:00 am (I am making this up). At that time, a cat is hungry of the time (C = 1), and not hungry of the time (C = 0). What is the entropy of the binary random variable C? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).

📗 Answer: .

📗 [4 points] What is the convolution between the image and the filter using zero padding? Remember to flip the filter first.

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [3 points] Suppose the vocabulary is the alphabet plus space (26 letters + 1 space character), what is the (maximum likelihood) estimated trigram probability \(\hat{\mathbb{P}}\left\{a | x, y\right\}\) with Laplace smoothing (add-1 smoothing) if the sequence \(x, y\) never appeared in the training set. The training set has tokens in total. Enter -1 if more information is required to estimate this probability.

📗 Answer: .

📗 [3 points] Given a Bayesian network \(A \to B \to C \to D\) of 4 binary event variables with the following conditional probability table (CPT), what is the probability that none of the events happen, \(\mathbb{P}\left\{\neg A, \neg B, \neg C, \neg D\right\}\)?

\(\mathbb{P}\left\{A\right\}\) =	\(\mathbb{P}\left\{B \| A\right\}\) =	\(\mathbb{P}\left\{C \| B\right\}\) =	\(\mathbb{P}\left\{D \| C\right\}\) =
\(\mathbb{P}\left\{\neg A\right\}\) =	\(\mathbb{P}\left\{B \| \neg A\right\}\) =	\(\mathbb{P}\left\{C \| \neg B\right\}\) =	\(\mathbb{P}\left\{D \| \neg C\right\}\) =

📗 Answer: .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment MX2. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 2" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:48 AM

\(\mathbb{P}\left\{A\right\}\) =	\(\mathbb{P}\left\{B \| A\right\}\) =	\(\mathbb{P}\left\{C \| B\right\}\) =	\(\mathbb{P}\left\{D \| C\right\}\) =
\(\mathbb{P}\left\{\neg A\right\}\) =	\(\mathbb{P}\left\{B \| \neg A\right\}\) =	\(\mathbb{P}\left\{C \| \neg B\right\}\) =	\(\mathbb{P}\left\{D \| \neg C\right\}\) =