Young Wu's Homepage

# Campus Section Midterm - Part 1

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please join Zoom for announcements: Link.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

📗 [4 points] Suppose the bigram model with transition matrix is used to generate a document with infinite length, and a unigram model is estimated based on the document, without smoothing. What are the unigram probabilities? Enter probabilities, one for each word type, in the same order as the one for the bigram transition matrix. The probabilities can be rounded to 4 decimal places and do not have to sum up to exactly \(1.0000\).

📗 Note: = .

📗 Answer (comma separated vector): .

📗 [4 points] Given an reconstructed feature vector using the first principal components \(x'_{i}\) = and reconstructed feature vector using the first principal components \(x''_{i}\) = . What is the principal component? If more information is needed, enter a vector of 0's.

📗 Answer (comma separated vector): .

📗 [3 points] Suppose the bigram probability estimated from a document with word tokens for \(\hat{\mathbb{P}}\left\{A | B\right\}\) with Laplace smoothing (add-1 smoothing) is . The vocabulary size is . If the sequence \(AB\) and \(BA\) never appeared in the document, how many times does \(B\) appear in the document? If more information is needed, enter \(-1\).

📗 Answer: .

📗 [3 points] What is the area of all points within units from \(x\) = if the distance is measured by \(L_{p}\) norm with \(p\) = .

📗 Answer: .

📗 [2 points] Suppose eigenfaces are computed based on a dataset containing images, each pixels by pixels. The first eigenvectors of the variance matrix are used as eigenfaces. What is the dimension (number of elements) of one eigenface?

📗 Answer: .

📗 [4 points] Given the following transition matrix for a bigram model with words "", "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

📗 Answer: .

📗 [2 points] In a corpus (set of documents) with word types (unique word tokens), the phrase "" appeared times. In particular, "" appeared times and "" appeared . If we estimate probability by frequency (the maximum likelihood estimate) with Laplace smoothing (add-1 smoothing), what is the estimated probability of \(\mathbb{P}\){ | }?

📗 Answer: .

📗 [3 points] A TV series is reviewed bombed if at least one season gets low audience score (significantly lower than critic score). Suppose "the Boys" has seasons, and each one of these seasons gets a low audience score with probability if series is bad, and probability if the series is good. Given "the Boys" is review bombed, what is the probability that it is a bad series? The prior probability of a bad series is .

📗 Answer: .

📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .

📗 Answer: .

📗 [4 points] Consider the problem of detecting if an email message is a spam. Say we use three variables to model this problem: a binary label \(S\) indicates if the message is a spam, and two binary features: \(C, F\) indicating whether the message contains "Cash" and "Free". We use a Naive Bayes classifier with the following estimated probabilities from the training set:

Prior	\(\mathbb{P}\left\{S = 1\right\}\) =	-
Hams	\(\mathbb{P}\left\{C = 1 \| S = 0\right\}\) =	\(\mathbb{P}\left\{F = 1 \| S = 0\right\}\) =
Spams	\(\mathbb{P}\left\{C = 1 \| S = 1\right\}\) =	\(\mathbb{P}\left\{F = 1 \| S = 1\right\}\) =

Compute the posterior probability that the email is a spam given the following features: \(\mathbb{P}\){\(S = 1\) | \(C\) = , \(F\) = }.

📗 Answer: .

📗 [3 points] What is the distance between clusters \(C_{1}\) = {} and \(C_{2}\) = {} using linkage?

📗 Answer: .

📗 [4 points] You are given the distance table. Consider the next iteration of hierarchical agglomerative clustering (another name for the hierarchical clustering method we covered in the lectures) using linkage. What will the new values be in the resulting distance table corresponding to the new clusters? If you merge two columns (rows), put the new distances in the column (row) with the smaller index. For example, if you merge columns 2 and 4, the new column 2 should contain the new distances and column 4 should be removed, i.e. the columns and rows should be in the order (1), (2 and 4), (3), (5).

\(d\) =

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [4 points] Suppose K-Means with \(K = 2\) is used to cluster the data set and initial cluster centers are \(c_{1}\) = and \(c_{2}\) = \(x\). What is the value of \(x\) if cluster 1 has \(n\) = points initially (before updating the cluster centers). Break ties by assigning the point to cluster 2.

📗 Answer: .

📗 [3 points] Given data and initial k-means cluster centers \(c_{1}\) = and \(c_{2}\) = , what is the initial total distortion (do not take the square root). Use Euclidean distance. Break ties by assigning points to the first cluster.

📗 Answer: .

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment X1. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 1" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:49 AM