Young Wu's Homepage

# Midterm - Part 1

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please join Zoom for announcements: Link.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [3 points] (new) There are three training documents in a training set with words in each document. The word "" appeared in total in the three documents. The TF-IDF features for the word "" for the first two documents are both \(0\). What is the TF-IDF feature for the third document? Note: there are two possible answers to this question, one of them is \(0\), what is the other one?

📗 Answer: .

📗 [3 points] (new) When training policy networks to control Flappy Birds, genetic algorithm is used and the fitness of the birds in the population are . During the cross-over phase, reproduction probabilities are computed based on fitness and sampling is done with replacement. What is the probability that the first bird (i.e. with fitness ) is sampled twice in a single iteration (single cross-over between two birds) of cross-over (thus surviving to the next generation)?

📗 Answer: .

📗 [3 points] (new) Given the following training set that is not linearly separable for hard-margin support vector machine (SVM): \(\left\{\left(x_{i}, y_{i}\right)\right\}_{i=1}^{3}\) = \((-1, A), \left(0, B\right), \left(1, A\right)\), which of the following feature maps will create the new features so that the resulting training set is linearly separable for kernel SVM?

1	2	3	4	5	6

Enter a comma separated list of indices, for example, if you think choices 1, 3, 5 are the answers, enter "1, 3, 5" (order does not matter), and if you think none of the choices are correct, enter "-1" (do not enter "0" or other texts).

📗 Answer (comma separated vector): .

📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights \(w\) = and bias \(b\) = trained using the Perceptron Algorithm. Given a new input \(x\) = and \(y\) = . Let the learning rate be \(\alpha\) = , compute the updated weights, \(w', b'\) = :

📗 Answer (comma separated vector): .

📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias

\(x_{i1}\)	\(x_{i2}\)	\(x_{i3}\)	\(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0	0	1	?
0	1	0	?
1	0	1	?
1	1	0	?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer (comma separated vector): .

📗 [2 points] What are the smallest and largest values of subderivatives of at \(x = 0\).

📗 Answer (comma separated vector): .

📗 [4 points] "It" has a house with many doors. A random door is about to be opened with equal probability. Doors to have monsters that eat people. Doors to are safe. With sufficient bribe, Pennywise will answer your question "Will door 1 be opened?" What's the information gain (also called mutual information) between Pennywise's answer and your encounter with a monster?

📗 Answer: .

📗 [3 points] In one iteration of the Perceptron Algorithm, \(x\) = , \(y\) = , and predicted label \(\hat{y} = a\) = . The learning rate \(\alpha = 1\). After the iteration, how many of the weights (include bias \(b\)) are increased (the change is strictly larger than 0). If it is impossible to figure out given the information, enter -1.

📗 Answer: .

📗 [3 points] In one step of sub-gradient descent for a \(L_{1}\) regularized logistic regression, suppose \(w\) = , \(b\) = , and \(\dfrac{\partial C}{\partial w}\) = , \(\dfrac{\partial C}{\partial b}\) = . If the learning rate is \(\alpha\) = and the regularization parameter is \(\lambda\) = , what is \(w\) after one iteration? Use the loss \(C\left(w, b\right)\) plus the regularization \(\lambda \left\|\begin{bmatrix} w \\ b \end{bmatrix}\right\|_{1}\) = \(\lambda \left(\left| w \right| + \left| b \right|\right)\), that is, there is a term with coefficient \(\alpha \lambda\) in the gradient. If there are multiple sub-derivatives, use the one with the smallest absolute value.

📗 Answer: .

📗 [3 points] A hard margin SVM (Support Vector Machine) is trained on the following dataset. Suppose we restrict \(b\) = , what is the value of \(w\)? Enter a single number, i.e. do not include \(b\). Assume the SVM classifier is \(1_{\left\{w x + b \geq 0\right\}}\) (this means it predict 1 if \(w x + b \geq 0\) and 0 otherwise.

\(x_{i}\)
\(y_{i}\)

📗 Answer: .

📗 [4 points] Given the following training set, add one item \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) with \(y\) = so that all 7 items are support vectors for the Hard Margin SVM (Support Vector Machine) trained on the new training set.

\(x_{1}\)	\(x_{2}\)	\(y\)
		0
		0
		0
		1
		1
		1

📗 Answer (comma separated vector): .

📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the hinge cost function . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one stochastic (sub)gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .

📗 Answer (comma separated vector): .

📗 [3 points] Given the following training set, what is the maximum accuracy of a decision tree with depth 1 trained on this set? Enter a number between 0 and 1.

index	\(x_{1}\)	\(y\)
1
2
3
4
5
6

📗 Answer: .

📗 [4 points] Given the following conditional entropy values, what distribution of \(X\) would imize the information gain \(I\left(Y | X\right)\)? Assume \(H\left(Y\right)\) = . Enter a vector of probabilities between 0 and 1 that sum up to 1: \(\mathbb{P}\left\{X = 1\right\}, \mathbb{P}\left\{X = 2\right\}, \mathbb{P}\left\{X = 3\right\}, \mathbb{P}\left\{X = 4\right\}\).

Conditional Entropy	\(H\left(Y \| X = 1\right)\)	\(H\left(Y \| X = 2\right)\)	\(H\left(Y \| X = 3\right)\)	\(H\left(Y \| X = 4\right)\)
-

📗 Answer (comma separated vector):

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment X1. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 1" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: November 21, 2025 at 11:41 PM

Conditional Entropy	\(H\left(Y \| X = 1\right)\)	\(H\left(Y \| X = 2\right)\)	\(H\left(Y \| X = 3\right)\)	\(H\left(Y \| X = 4\right)\)
-