Young Wu's Homepage

# XM2 Exam Part 2 Version B

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please join Zoom for announcements: Link.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Given the training set below and find the label of the decision tree that achieves 100 percent accuracy. Enter \(\hat{y}_{1}, \hat{y}_{2}, \hat{y}_{3}, \hat{y}_{4}\) as a vector.

📗 The training set:

\(x_{1}\)	\(x_{2}\)	\(y\)
\(0\)	\(0\)
\(0\)	\(1\)
\(1\)	\(0\)
\(1\)	\(1\)

📗 The decision tree:

if \(x_{1} \leq 0.5\)	if \(x_{2} \leq 0.5\)	label \(\hat{y}_{1}\)
-	else \(x_{2} > 0.5\)	label \(\hat{y}_{2}\)
else \(x_{1} > 0.5\)	if \(x_{2} \leq 0.5\)	label \(\hat{y}_{3}\)
-	else \(x_{2} > 0.5\)	label \(\hat{y}_{4}\)

📗 Answer (comma separated vector): .

📗 [4 points] Given a neural network with 1 hidden layer with hidden units, suppose the current hidden layer weights are \(w^{\left(1\right)}\) = = , and the output layer weights are \(w^{\left(2\right)}\) = = . Given an instance (item) \(x\) = and \(y\) = , the activation values are \(a^{\left(1\right)}\) = = and \(a^{\left(2\right)}\) = . What is updated weight \(w^{\left(1\right)}_{21}\) after one step of stochastic gradient descent based on \(x\) with learning rate \(\alpha\) = ? The activation functions are all and the cost is square loss.

📗 Reminder: logistic activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = a_{i} \left(1 - a_{i}\right)\), tanh activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1 - a_{i}^{2}\), ReLU activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1_{\left\{a_{i} \geq 0\right\}}\), and square cost has gradient \(\dfrac{\partial C_{i}}{\partial a_{i}} = a_{i} - y_{i}\).

📗 Answer: .

📗 [3 points] Given the following training set, what is the maximum accuracy of a decision tree with depth 1 trained on this set? Enter a number between 0 and 1.

index	\(x_{1}\)	\(y\)
1
2
3
4
5
6

📗 Answer: .

📗 [4 points] Given the two training points and and their labels \(0\) and \(1\). What is the kernel (Gram) matrix if the RBF (radial basis function) Gaussian kernel with \(\sigma\) = is used? Use the formula \(K_{i i'} = e^{- \dfrac{1}{2 \sigma^{2}} \left(x_{i} - x_{i'}\right)^\top \left(x_{i} - x_{i'}\right)}\).

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [3 points] Suppose a soft margin support vector machine is trained on two points, \(x_{1}\) = , \(y_{1}\) = and \(x_{2}\) = , \(y_{2}\) = . Given the regularization parameter \(\lambda\) = , what is the soft margin loss at \(w\) = and \(b\) = ? Use \(C = \dfrac{\lambda}{2} w^\top w + \dfrac{1}{n} \displaystyle\sum_{i=1}^{n} \displaystyle\max\left\{0, 1 - \left(2 y_{i} - 1\right)\left(w^\top x_{i} + b\right)\right\}\).

📗 Answer: .

📗 [4 points] A convolutional neural network has input image of size x that is connected to a convolutional layer that uses a x filter, zero padding of the image, and a stride of 1. There are activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses x max pooling, a stride of (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.

📗 Answer: .

📗 [3 points] A hospital trains a decision tree to predict if any given patient has technophobia or not. The training set consists of patients. There are features. The labels are binary. The decision tree is not pruned. What are the smallest and largest possible training set accuracy of the decision tree? Enter two numbers between 0 and 1. Hint: patients with the same features may have different labels.

📗 Answer (comma separated vector): .

📗 [4 points] Given a linear SVM (Support Vector Machine) that perfectly classifies a set of training data containing positive examples and negative examples. What is the minimum possible number of training examples that need be removed to cause the margin of a linear SVM to increase? If the answer is impossible, enter "-1".

📗 Answer: .

📗 [4 points] Suppose the squared loss is used to do stochastic gradient descent for logistic regression, i.e. \(C = \dfrac{1}{2} \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\) where \(a_{i} = \dfrac{1}{1 + e^{- w x_{i} - b}}\). Given the current weight \(w\) = and bias \(b\) = , with \(x_{i}\) = , \(y_{i}\) = , \(a_{i}\) = (no need to recompute this value), with learning rate \(\alpha\) = . What is the updated after the iteration? Enter a single number.

📗 Answer: .

📗 [4 points] "It" has a house with many doors. A random door is about to be opened with equal probability. Doors to have monsters that eat people. Doors to are safe. With sufficient bribe, Pennywise will answer your question "Will door 1 be opened?" What's the information gain (also called mutual information) between Pennywise's answer and your encounter with a monster?

📗 Answer: .

📗 [4 points] List English letters from A to Z: ABCDEFGHIJKLMNOPQRSTUVWXYZ. Define the distance between two letters in the natural way, that is \(d\left(A, A\right) = 0\), \(d\left(A, B\right) = 1\), \(d\left(A, C\right) = 2\) and so on. Each letter has a label, are labeled 0, and the others are labeled 1. This is your training data. Now classify each letter using kNN (k Nearest Neighbor) for odd \(k = 1, 3, 5, 7, ...\). What is the smallest \(k\) where all letters are classified the same (same label, i.e. either all labels are 0s or all labels are 1s). Break ties by preferring the earlier letters in the alphabet. Hint: the nearest neighbor of a letter is the letter itself.

📗 Answer: .

📗 [4 points] In a convolutional neural network, suppose the activation map of a convolution layer is . What is the activation map after a non-overlapping (stride 2) 2 by 2 max-pooling layer?

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [4 points] Consider the problem of detecting if an email message is a spam. Say we use four random variables to model this problem: a binary class variable \(S\) indicates if the message is a spam, and three binary feature variables: \(C, F, N\) indicating whether the message contains "Cash", "Free", "Now". We use a Naive Bayes classifier with associated CPTs (Conditional Probability Table):

Prior	\(\mathbb{P}\left\{S = 1\right\}\) =	-	-
Hams	\(\mathbb{P}\left\{C = 1 \| S = 0\right\}\) =	\(\mathbb{P}\left\{F = 1 \| S = 0\right\}\) =	\(\mathbb{P}\left\{N = 1 \| S = 0\right\}\) =
Spams	\(\mathbb{P}\left\{C = 1 \| S = 1\right\}\) =	\(\mathbb{P}\left\{F = 1 \| S = 1\right\}\) =	\(\mathbb{P}\left\{N = 1 \| S = 1\right\}\) =

Compute \(\mathbb{P}\){\(C\) = , \(F\) = , \(N\) = }.

📗 Answer: .

📗 [3 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = C = 1\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = 0, C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = 1, C = 0\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = C = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?

📗 Answer: .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment MX2. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 4" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:48 AM

Prior	\(\mathbb{P}\left\{S = 1\right\}\) =	-	-
Hams	\(\mathbb{P}\left\{C = 1 \| S = 0\right\}\) =	\(\mathbb{P}\left\{F = 1 \| S = 0\right\}\) =	\(\mathbb{P}\left\{N = 1 \| S = 0\right\}\) =
Spams	\(\mathbb{P}\left\{C = 1 \| S = 1\right\}\) =	\(\mathbb{P}\left\{F = 1 \| S = 1\right\}\) =	\(\mathbb{P}\left\{N = 1 \| S = 1\right\}\) =