Young Wu's Homepage

# Midterm - Part 2

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please join Zoom for announcements: Link.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [3 points] (new) Suppose there are two classes \(C\) for "cats" and \(F\) for "flerkens" and \(4\) training items each with a single numerical feature \(\left(x_{i}, y_{i}\right)\) = . What is the minimum and maximum (due to the different ways the training set is divided into folds) 2-fold cross validation accuracy of the 1-nearest neighbor classifier on this training set? Enter two numbers between 0 and 1. If the minimum and the maximum are the same, enter the same number twice.

📗 Answer (comma separated vector): .

📗 [3 points] (new) When training a recurrent neural network with input features, hidden recurrent units, and output units. During one step of stochastic gradient descent, for one training item sequence of length , the values of weights and biases are updated (a weight is considered updated if one gradient descent step is applied, including when the gradient is 0). For another training item sequence of length , how many weights and biases are updated? Enter a single integer: the total number of weights plus biases.

📗 Answer: .

📗 [4 points] In a simple recurrent neural network with a single hidden recurrent unit and one output unit at the end, suppose all activation functions are LTU (Linear Threshold Unit, or \(g\left(z\right) = 1_{z \geq 0}\)), and the weights are given by \(w^{\left(x\right)}\) = , \(w^{\left(a\right)}\) = , \(w^{\left(y\right)}\) = , \(b^{\left(a\right)}\) = , \(b^{\left(y\right)}\) = . What is the label of a sentence \(\left(x_{1}, x_{2}, ..., x_{T}\right)\) = ?

📗 You can use the activation formulas: \(a_{t} = g\left(w^{\left(x\right)} x_{t} + w^{\left(a\right)} a_{t-1} + b^{\left(a\right)}\right)\) with \(a_{1} = g\left(w^{\left(x\right)} x_{1} + b^{\left(a\right)}\right)\) and \(y = g\left(w^{\left(y\right)} a_{T} + b^{\left(y\right)}\right)\).

📗 Answer:

📗 [4 points] Given the following training data, what is the fold cross validation accuracy (i.e. LOOCV, Leave One Out Cross Validation) if NN (Nearest Neighbor) classifier with Manhattan distance is used. Break the tie (in distance) by using the instance with the smaller index. Enter a number between 0 and 1.

Index	1	2	3	4	5
\(x_{i}\)
\(y_{i}\)

📗 Answer: .

📗 [4 points] What is the convolution between the image and the filter using zero padding? The flipped filter is .

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [4 points] A convolutional neural network has input image of size x that is connected to a convolutional layer that uses a x filter, zero padding of the image, and a stride of 1. There are activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses x max pooling, a stride of (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.

📗 Answer: .

📗 [3 points] Given three documents \(\left\{1, 2, 3\right\}\) with a vocabulary of {"Hello", "World"}, compute the TF-IDF (Term-Frequency-Inverse-Document-Frequency) features of document .

Document	Word	Number of times
1	"Hello"
-	"World"
2	"Hello"
-	"World"
3	"Hello"
-	"World"

📗 Answer (comma separated vector): .

📗 [1 points] Consider the AmongUs ඞ example where the joint probability table is summarized in the following table, what is the probability \(\mathbb{P}\){\(y\) = | \(x\) = }?

Number of occurrences	Being suspicious \(x = 1\)	Not being suspicious \(x = 0\)
An assassin \(y = 1\)
Not an assassin \(y = 0\)

📗 Answer: .

📗 [3 points] A hospital trains a decision tree to predict if any given patient has technophobia or not. The training set consists of patients. There are features. The labels are binary. The decision tree is not pruned. What are the smallest and largest possible training set accuracy of the decision tree? Enter two numbers between 0 and 1. Hint: patients with the same features may have different labels.

📗 Answer (comma separated vector): .

📗 [4 points] Say we have a training set consisting of items with label \(0\), and items with label \(1\) where each item has two features and all items have distinct features. What is the classification accuracy of NN (Nearest Neighbor) on the training set (note: this is not k-fold cross validation, meaning all items are used in training).

📗 Answer: .

📗 [2 points] In a corpus (set of documents) with word types (unique word tokens), the phrase "" appeared times. In particular, "" appeared times and "" appeared . If we estimate probability by frequency (the maximum likelihood estimate) with Laplace smoothing (add-1 smoothing), what is the estimated probability of \(\mathbb{P}\){ | }?

📗 Answer: .

📗 [4 points] Given the following transition matrix for a bigram model with words "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

📗 Answer: .

📗 [3 points] A tweet is ratioed if at least one reply gets more likes than the tweet. Suppose a tweet has replies, and each one of these replies gets more likes than the tweet with probability if the tweet is bad, and probability if the tweet is good. Given a tweet is ratioed, what is the probability that it is a bad tweet? The prior probability of a bad tweet is .

📗 Answer: .

📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .

📗 Answer: .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment X2. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 2" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: November 21, 2025 at 11:41 PM