Young Wu's Homepage

# M1A Midterm Part 1 Version A

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 In case the questions are not generated correctly, try (1) refresh the page, (2) clear the browser cache, (3) switch to incognito/private browsing mode, (4) switch to another browser, (5) use a different ID. If none of these work, please post a private message on Piazza with your ID.

📗 Join Zoom if you have questions:

Zoom Link

📗 Please do not refresh the page (after you start): your answers will not be saved.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights \(w\) = and bias \(b\) = trained using the Perceptron Algorithm. Given a new input \(x\) = and \(y\) = . Let the learning rate be \(\alpha\) = , compute the updated weights, \(w', b'\) = :

📗 Answer (comma separated vector): .

📗 [2 points] Consider a single sigmoid perceptron with bias weight \(w_{0}\) = , a single input \(x_{1}\) with weight \(w_{1}\) = , and the sigmoid activation function \(g\left(z\right) = \dfrac{1}{1 + \exp\left(-z\right)}\). For what input \(x_{1}\) does the perceptron output value \(a\) = .

📗 The red curve is a plot of the activation function, given the y-value of the green point, the question is asking for its x-value.

📗 Note: Math.js does not accept "ln(...)", please use "log(...)" instead.

📗 Answer: .

📗 [3 points] A bag contains \(n\) = different colored balls. Randomly draw a ball from the bag with equal probability. What is the entropy of the outcome? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).

📗 Answer: .

📗 [4 points] You are given a training set of six points and their 2-class classifications (+ or -): (, +), (, +), (, +), (, -), (, -), (, -). What is the decision boundary associated with this training set using 3NN (3 Nearest Neighbor)? Note: there is one more point compared to the question from the homework.

📗 Answer: .

📗 [2 points] Given the training data "", with the gram model, what is the probability of observing the new sentence "" given the first word is ? Use MLE (Maximum Likelihood Estimate) without smoothing and do not include the probability of observing the first word.

📗 Answer: .

📗 [3 points] There are two biased coins in my pocket: coin A has \(\mathbb{P}\left\{H | A\right\}\) = , coin B has \(\mathbb{P}\left\{H | B\right\}\) = . I took out a coin from the pocket at random with probability of A is . I flipped it three times (independently) and the outcome is . What is the probability that the coin was ?

📗 Answer: .

📗 [2 points] You have a vocabulary with \(n\) = word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{\text{dune}}\) = . Using Laplace smoothing (add ), compute \(p_{\text{dune}}\).

📗 Answer: .

📗 [4 points] Given the following training set, add one instance \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) with \(y\) = so that all instances are support vectors for the Hard Margin SVM (Support Vector Machine) trained on the new training set.

\(x_{1}\)	\(x_{2}\)	\(y\)
		0
		0
		0
		1
		1
		1

📗 Note: in the diagram, currently, the two support vectors are connected by the grey line and the black line represents the SVM classification boundary. After adding one point, you should be able to make all seven points support vectors with the classification boundary given by the green line.

📗 Answer (comma separated vector): .

📗 [4 points] In a convolutional neural network, suppose the activation map of a convolution layer is . What is the activation map after a non-overlapping (stride 2) 2 by 2 max-pooling layer?

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [4 points] Given the following transition matrix for a bigram model with words "I" (label 0), "am" (label 1) and "Groot" (label 2): . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). Two uniform random numbers between 0 and 1 are generated to simulate the words after "I", say \(u_{1}\) = and \(u_{2}\) = . Using the CDF (Cumulativ Distribution Function) inversion method (inverse transform method), which two words are generated? Enter two integer labels (0, 1, or 2), not strings.

📗 Answer (comma separated vector): .

📗 [4 points] List English letters from A to Z: ABCDEFGHIJKLMNOPQRSTUVWXYZ. Define the distance between two letters in the natural way, that is \(d\left(A, A\right) = 0\), \(d\left(A, B\right) = 1\), \(d\left(A, C\right) = 2\) and so on. Each letter has a label, are labeled 0, and the others are labeled 1. This is your training data. Now classify each letter using kNN (k Nearest Neighbor) for odd \(k = 1, 3, 5, 7, ...\). What is the smallest \(k\) where all letters are classified the same (same label, i.e. either all labels are 0s or all labels are 1s). Break ties by preferring the earlier letters in the alphabet. Hint: the nearest neighbor of a letter is the letter itself.

📗 Answer: .

📗 [4 points] Consider an unbiased estimator \(X\) for a parameters \(\theta\). We have \(\mathbb{E}\left[X\right]\) = , \(Var\left[X\right]\) = , \(\mathbb{E}\left[Y\right]\) = , \(Var\left[Y\right]\) = . We would like a modified estimator \(Z = X - Y\) to have a reduced variance compared to \(X\). For what covariance \(Cov\left[X, Y\right]\) can we achieve \(Var\left[Z\right] \leq Var\left[X\right]\)? Note: there are many possible answers, enter only one of them.

📗 Answer: .

📗 [3 points] Let a dataset consist of \(n\) = points in \(\mathbb{R}\), specifically, the first \(n - 1\) points are and the last point \(x_{n}\) is unknown. What is the smallest value of \(x_{n}\) above which \(x_{n-1}\) is among \(x_{n}\)'s 3-nearest neighbors, but \(x_{n}\) is NOT among \(x_{n-1}\)'s 3-nearest neighbor? Note that the 3-nearest neighbors of a point in the training set include the point itself.

📗 Answer: .

📗 [3 points] Assume the prior probability of having a female child (girl) is the same as having a male child (boy) and both are 0.5. The Smith family has kids. One day you saw one of the Smith children, and she is a girl. The Wood family has kids, too, and you heard that at least one of them is a girl. What is the chance that the Smith family has a boy? What is the chance that the Wood family has a boy?

📗 Answer (comma separated vector): .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M1A. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 1A" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:47 AM