Young Wu's Homepage

# XM1 Exam Part 1 Version A

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please join Zoom for announcements: Link.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [2 points] The Perceptron algorithm does not terminate (cannot converge) for any learning rate on the following training set. Give an example of \(x_{1}\). If there are multiple possible answers, enter only one, and if no such \(x\) exists, enter \(-1\).

\(i\)	\(x_{i}\)	\(y_{i}\)
1	\(x_{1}\)
2
3

📗 Answer: .

📗 [4 points] Suppose the partial derivative of a cost function \(C\) with respect to some weight \(w\) is given by \(\dfrac{\partial C}{\partial w} = \displaystyle\sum_{i=1}^{n} \dfrac{\partial C_{i}}{\partial w} = \displaystyle\sum_{i=1}^{n} w x_{i}\). Given a data set \(x\) = {} and initial weight \(w\) = , compute and compare the updated weight after 1 step of batch gradient descent and steps of stochastic gradient descent (start with the same initial weight, then use data point 1 in step 1, data point 2 in step 2, ...). Enter two numbers, comma separated, batch gradient descent first. Use the learning rate \(\alpha\) = .

📗 Answer (comma separated vector): .

📗 [3 points] Suppose there are three classifiers \(f_{1}, f_{2}, f_{3}\) to choose from (i.e. the hypothesis space has three elements), and the activation values from these classifiers based on a training set of three items are listed below. Which classifier is the best one if loss is used for comparison? (Enter a number 1 or 2 or 3).

📗 Reminder: zero-one loss means \(\displaystyle\sum_{i=1}^{n} 1_{\left\{a_{i} \neq y_{i}\right\}}\), square loss means \(\displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\), cross entropy loss means \(\displaystyle\sum_{i=1}^{n} -y_{i} \log\left(a_{i}\right) + \left(1 - y_{i}\right) \log\left(1 - a_{i}\right)\).

Items	1	2	3
\(y\)
\(f_{1}\)
\(f_{2}\)
\(f_{3}\)

📗 Answer: .

📗 [4 points] Given the vocabulary \(a, b\) and the following probabilities, compute the bigram (Markov) transition matrix, row (column) 1 corresponding to \(a\) and row (column) 2 corresponding to \(b\). Note: in the table \(\mathbb{P}\left\{a b\right\}\) means the probability of \(b\) after \(a\).

\(\mathbb{P}\left\{a b\right\}\)	\(\mathbb{P}\left\{a a\right\}\)	\(\mathbb{P}\left\{b a\right\}\)	\(\mathbb{P}\left\{b b\right\}\)

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights \(w\) = and bias \(b\) = trained using the Perceptron Algorithm. Given a new input \(x\) = and \(y\) = . Let the learning rate be \(\alpha\) = , compute the updated weights, \(w', b'\) = :

📗 Answer (comma separated vector): .

📗 [2 points] Consider a single sigmoid perceptron with bias weight \(w_{0}\) = , a single input \(x_{1}\) with weight \(w_{1}\) = , and the sigmoid activation function \(g\left(z\right) = \dfrac{1}{1 + \exp\left(-z\right)}\). For what input \(x_{1}\) does the perceptron output value \(a\) = .

📗 Note: Math.js does not accept "ln(...)", please use "log(...)" instead.

📗 Answer: .

📗 [3 points] Suppose you are given a neural network with hidden layers, input units, output units, and hidden units. In one backpropogation step when computing the gradient of the cost (for example, squared loss) with respect to \(w^{\left(1\right)}_{11}\), the weight in layer \(1\) connecting input \(1\) and hidden unit \(1\), how many weights (including \(w^{\left(1\right)}_{11}\) itself, and including biases) are used in the backpropogation step of \(\dfrac{\partial C}{\partial w^{\left(1\right)}_{11}}\)?

📗 Note: the backpropogation step assumes the activations in all layers are already known so do not count the weights and biases in the forward step computing the activations.

📗 Answer: .

📗 [4 points] If \(K\left(x, x'\right)\) is a kernel with induced feature representation \(\varphi\left(x_{0}\right)\) = , and \(G\left(x, x'\right)\) is another kernel with induced feature representation \(\theta\left(x_{0}\right)\) = , then it is known that \(H\left(x, x'\right) = a K\left(x, x'\right) + b G\left(x, x'\right)\), \(a\) = , \(b\) = is also a kernel. What is the induced feature representation of \(H\) for this \(x_{0}\)?

📗 Answer (comma separated vector): .

📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the hinge cost function . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one stochastic (sub)gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .

📗 Answer (comma separated vector): .

📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .

📗 Answer: .

📗 [4 points] You are given a training set of six points and their 2-class classifications (+ or -): (, +), (, +), (, +), (, -), (, -), (, -). What is the decision boundary associated with this training set using 3NN (3 Nearest Neighbor)? Note: there is one more point compared to the question from the homework.

📗 Answer: .

📗 [2 points] There is a total of red or green balls in a bag. How many red balls and how many green balls are there so that the entropy of the color of a randomly selected ball is imized?

📗 Answer (comma separated vector): .

📗 [4 points] Given the following transition matrix for a bigram model with words "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

📗 Answer: .

📗 [4 points] John tells his professor that he forgot to submit his homework assignment. From experience, the professor knows that students who finish their homework on time forget to turn it in with probability . She also knows that of the students who have not finished their homework will tell her they forgot to turn it in. She thinks that of the students in this class completed their homework on time. What is the probability that John is telling the truth (i.e. he finished it given that he forgot to submit it)?

📗 Answer: .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment MX1. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 1" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:48 AM