# XM2 Exam Part 2 Version A

πŸ“— Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key)  

πŸ“— You can also load from your saved file
and click .
πŸ“— If the questions are not generated correctly, try refresh the page using the button at the top left corner.
πŸ“— The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end. 
πŸ“— Please do not refresh the page: your answers will not be saved.
πŸ“— Please join Zoom for announcements: Link.

# ID: test


# Question 1

πŸ“— [4 points] Given the training set below and find the label of the decision tree that achieves 100 percent accuracy. Enter y^1,y^2,y^3,y^4 as a vector.
πŸ“— The training set:
x1 x2 y
0 0 1
0 1 0
1 0 1
1 1 0

πŸ“— The decision tree:
if x1≀0.5 if x2≀0.5 label y^1
- else x2>0.5 label y^2
else x1>0.5 if x2≀0.5 label y^3
- else x2>0.5 label y^4

πŸ“— Answer (comma separated vector): .


# Question 2

πŸ“— [4 points] Given a neural network with 1 hidden layer with 2 hidden units, suppose the current hidden layer weights are w(1) = [w11w12w21w22] = [βˆ’0.17βˆ’0.23βˆ’0.360.15], and the output layer weights are w(2) = [w1w2] = [0.440.32]. Given an instance (item) x = [0.80.97] and y = 0, the activation values are a(1) = [a1a2] = [0.970.41] and a(2) = 0.84. What is updated weight w21(1) after one step of stochastic gradient descent based on x with learning rate Ξ± = 0.86? The activation functions are all logistic and the cost is square loss.

πŸ“— Reminder: logistic activation has gradient βˆ‚aiβˆ‚zi=ai(1βˆ’ai), tanh activation has gradient βˆ‚aiβˆ‚zi=1βˆ’ai2, ReLU activation has gradient βˆ‚aiβˆ‚zi=1{aiβ‰₯0}, and square cost has gradient βˆ‚Ciβˆ‚ai=aiβˆ’yi.
πŸ“— Answer: .


# Question 3

πŸ“— [4 points] Suppose the only three support vectors in a data set is [βˆ’22] with label 0 and [βˆ’45] with label 1 and x with label 1, let the margin (the distance between the plus and minus planes) be 2. What is x? If there are multiple possible values, enter one of them, if there are none, enter βˆ’1,βˆ’1.
πŸ“— Answer (comma separated vector): .


# Question 4

πŸ“— [4 points] Given the two training points [βˆ’2βˆ’2] and [βˆ’42] and their labels 0 and 1. What is the kernel (Gram) matrix if the RBF (radial basis function) Gaussian kernel with Οƒ = 8 is used? Use the formula Kiiβ€²=eβˆ’12Οƒ2(xiβˆ’xiβ€²)⊀(xiβˆ’xiβ€²).
πŸ“— Answer (matrix with multiple lines, each line is a comma separated vector): .


# Question 5

πŸ“— [4 points] Suppose the squared loss is used to do stochastic gradient descent for logistic regression, i.e. C=12βˆ‘i=1n(aiβˆ’yi)2 where ai=11+eβˆ’wxiβˆ’b. Given the current weight w = 0.41 and bias b = 0.38, with xi = 0.32, yi = 1, ai = 0.63 (no need to recompute this value), with learning rate Ξ± = 0.68. What is the updated bias after the iteration? Enter a single number.
πŸ“— Answer: .


# Question 6

πŸ“— [4 points] A convolutional neural network has input image of size 30 x 30 that is connected to a convolutional layer that uses a 5 x 5 filter, zero padding of the image, and a stride of 1. There are 3 activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses 3 x 3 max pooling, a stride of 3 (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains 5 output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.
πŸ“— Answer: .


# Question 7

πŸ“— [2 points] Let w = [βˆ’1βˆ’1] and b = βˆ’5. For the point x = [βˆ’21], y = 1, what is the smallest slack value ΞΎ for it to satisfy the margin constraint?
πŸ“— Answer: .


# Question 8

πŸ“— [4 points] Given a linear SVM (Support Vector Machine) that perfectly classifies a set of training data containing 7 positive examples and 7 negative examples. What is the maximum possible number of training examples that could be removed and still produce the exact same SVM as derived for the original training set?
πŸ“— Answer: .


# Question 9

πŸ“— [3 points] Consider a 4-dimensional feature space where each feature takes integer value from 0 to 3 (including 0 and 3). What is the smallest and largest Euclidean distance between the two distinct (non-overlapping) points in the feature space?
πŸ“— Answer (comma separated vector): .


# Question 10

πŸ“— [2 points] Consider the following directed graphical model over binary variables: Aβ†’B←C with the following training set.
A B C
0 1 0
0 1 0
0 1 1
0 1 1
1 0 0
1 1 0
1 1 1
1 1 1

What is the MLE (Maximum Likelihood Estimate) with Laplace smoothing of the conditional probability that P{ B = 0 | A = 0, C = 0}?
πŸ“— Answer: .


# Question 11

πŸ“— [3 points] Statistically, cats are often hungry around 6:00 am (I am making this up). At that time, a cat is hungry 35 of the time (C = 1), and not hungry 25 of the time (C = 0). What is the entropy of the binary random variable C? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).
πŸ“— Answer: .


# Question 12

πŸ“— [4 points] What is the convolution between the image [4436] and the filter [000000010] using zero padding? Remember to flip the filter first.
πŸ“— Answer (matrix with multiple lines, each line is a comma separated vector): .


# Question 13

πŸ“— [3 points] Suppose the vocabulary is the alphabet plus space (26 letters + 1 space character), what is the (maximum likelihood) estimated trigram probability P^{a|x,y} with Laplace smoothing (add-1 smoothing) if the sequence x,y never appeared in the training set. The training set has 500 tokens in total. Enter -1 if more information is required to estimate this probability.
πŸ“— Answer: .


# Question 14

πŸ“— [3 points] Given a Bayesian network Aβ†’Bβ†’Cβ†’D of 4 binary event variables with the following conditional probability table (CPT), what is the probability that none of the events happen, P{Β¬A,Β¬B,Β¬C,Β¬D}?
P{A} = 0.42 P{B|A} = 0.39 P{C|B} = 0.33 P{D|C} = 0.57
P{Β¬A} = 0.58 P{B|Β¬A} = 0.66 P{C|Β¬B} = 0.96 P{D|Β¬C} = 0.41

πŸ“— Answer: .


# Question 15

πŸ“— [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.
πŸ“— Answer: .



# Grade


 * * * *

 * * * * *

# Submission


πŸ“— Please do not modify the content in the above text field: use the "Grade" button to update.


πŸ“— Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment MX2. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
πŸ“— You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 2" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.


 





Last Updated: April 09, 2025 at 11:28 PM