📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15x3
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 [4 points] If \(\mathbb{P}\left\{A | B\right\}\) is times the value of \(\mathbb{P}\left\{B | A\right\}\), and \(\mathbb{P}\left\{A\right\}\) = . What is \(\mathbb{P}\left\{B\right\}\)?
📗 Answer: .
📗 [2 points] Let \(A \in\) and \(B \in\) . What is the least number of probabilities needed to fully specify the conditional probability table of B given A (\(\mathbb{P}\left\{B | A\right\}\))?
📗 Answer: .
📗 [3 points] You have a joint probability table over \(k\) = random variables \(X_{1}, X_{2}, ..., X_{k}\), where each variable takes \(m\) = possible values: \(1, 2, ..., m\). To compute the probability that \(X_{1}\) = , how many cells in the table do you need to access (at most)?
📗 Answer: .
📗 [2 points] Given the training data "", with the gram model, what is the probability of observing the new sentence "" given the first word is ? Use MLE (Maximum Likelihood Estimate) without smoothing and do not include the probability of observing the first word.
📗 Answer: .
📗 [2 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\) with the following training set.
A
B
C
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
What is the MLE (Maximum Likelihood Estimate) with Laplace smoothing of the conditional probability that \(\mathbb{P}\){ \(B\) = | \(A\) = , \(C\) = }?
📗 Answer: .
📗 [2 points] Given the following network \(A \to B \to C\) where A can take on values, B can take on values, C can take on values. Write down the minimum number of conditional probabilities that define the CPTs (Conditional Probability Table).
📗 Answer: .
📗 [4 points] Say we use Naive Bayes in an application where there are features represented by variables, each having possible values, and there are classes. How many probabilities must be stored in the CPTs (Conditional Probability Table) in the Bayesian network for this problem? Do not include probabilities that can be computed from other probabilities.
📗 Answer: .
📗 [4 points] Given the following transition matrix for a bigram model with words "I" (label 0), "am" (label 1) and "Groot" (label 2): . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). Two uniform random numbers between 0 and 1 are generated to simulate the words after "I", say \(u_{1}\) = and \(u_{2}\) = . Using the CDF (Cumulativ Distribution Function) inversion method (inverse transform method), which two words are generated? Enter two integer labels (0, 1, or 2), not strings.
📗 Answer (comma separated vector): .
📗 [3 points] Given the variance matrix \(\hat{\Sigma}\) = , what is the first principal component?
📗 Answer (comma separated vector):
📗 [4 points] What is the projected variance of and onto the principal component ? Use the MLE (Maximum Likelihood Estimate) formula for the variance: \(\sigma^{2} = \dfrac{1}{n} \displaystyle\sum_{i=1}^{n} \left(x_{i} - \mu\right)^{2}\) with \(\mu = \dfrac{1}{n} \displaystyle\sum_{i=1}^{n} x_{i}\).
📗 Answer: .
📗 [3 points] What is the distance between clusters \(C_{1}\) = {} and \(C_{2}\) = {} using linkage?
📗 Answer: .
📗 [4 points] You are given the distance table. Consider the next iteration of hierarchical clustering using linkage. What will the new values be in the resulting distance table corresponding to the new clusters? If you merge two columns (rows), put the new distances in the column (row) with the smaller index. For example, if you merge columns 2 and 4, the new column 2 should contain the new distances and column 4 should be removed, i.e. the columns and rows should be in the order (1), (2 and 4), (3).
\(d\) =
📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 [3 points] Given data and initial k-means cluster centers \(c_{1}\) = and \(c_{2}\) = , what is the initial total distortion (do not take the square root). Use Euclidean distance. Break ties by assigning points to the first cluster.
📗 Answer: .
📗 [3 points] You have a dataset with unique data points which you want to use k-means clustering on. You setup the experiment as follows: you apply k-means with different k's: \(k\) = . Which \(k\) value will minimize the total distortion? Enter -1 if the answer depends on the data points.
📗 Answer: .
📗 [4 points] Given the dataset , the cluster centers are computed by k-means clustering algorithm with \(k = 2\). The first cluster center is \(x\) and the second cluster center is . What is the imum value of \(x\) such that the second cluster is empty (contains 0 instances). In case of a tie in distance, the point belongs to cluster 1.
📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 3" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.