Young Wu's Homepage

Prev: X2 Next: X4
Back to midtern page: Link, final page: Link

# X3 Past Exam Problems

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

# Warning: please enter your ID before you start!

# Question 1

📗

# Question 2

📗

# Question 3

📗

# Question 4

📗

# Question 5

📗

# Question 6

📗

# Question 7

📗

# Question 8

📗

# Question 9

📗

# Question 10

📗

# Question 11

📗

# Question 12

📗

# Question 13

📗

# Question 14

📗

# Question 15

📗

# Question 16

📗

# Question 17

📗

# Question 18

📗

# Question 19

📗

# Question 20

📗

# Question 21

📗

# Question 22

📗

# Question 23

📗

# Question 24

📗

# Question 25

📗

# Question 26

📗

# Question 27

📗

# Question 28

📗

# Question 29

📗

# Question 30

📗

# Question 31

📗

# Question 32

📗

# Question 33

📗

# Question 34

📗

# Question 35

📗

# Question 36

📗

# Question 37

📗

# Question 38

📗

# Question 39

📗

# Question 40

📗

# Question 41

📗

# Question 42

📗

# Question 43

📗

# Question 44

📗

# Question 45

📗

# Question 46

📗

# Question 47

📗

# Question 48

📗

# Question 49

📗

# Question 50

📗

📗 [2 points] You have a vocabulary with \(n\) = word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{\text{tenet}}\) = . Using add-one smoothing \(\delta\) = (Laplace smoothing), compute \(p_{\text{tenet}}\).

📗 Answer: .

📗 [0 points] To be added.

📗 [2 points] In a corpus with word tokens, the phrase "San Francisco" appeared times. In particular, "San" appeared times and "Francisco" appeared . If we estimate probability by frequency (the maximum likelihood estimate), what is the estimated probability of P(Francisco | San)?

📗 Answer: .

📗 [0 points] To be added.

📗 [4 points] Given the following transition matrix for a bigram model with words "", "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

📗 Answer: .

📗 [4 points] Given the counts, find the maximum likelihood estimate of \(\mathbb{P}\left\{A = 1|B + C = s\right\}\), for \(s\) = .

A	B	C	counts
0	0	0
0	0	1
0	1	0
0	1	1
1	0	0
1	0	1
1	1	0
1	1	1

📗 Answer: .

📗 [3 points] Welcome to the Terrible-Three-Day-Tour! We will visit New York on Day 1. The rules for Day 2 and Day 3 are:

(a) If we were at New York the day before, with probability we will stay in New York, and with probability we will go to Baltimore.
(b) If we were at Baltimore the day before, with probability we will stay in Baltimore, and with probability we will go to Washington D.C.
On average, before you start the tour, what is your chance to visit (at least on one of the two days)?

📗 Answer: .

📗 [2 points] Given the training data "", with the gram model, what is the probability of observing the new sentence "" given the first word is ? Use MLE (Maximum Likelihood Estimate) without smoothing and do not include the probability of observing the first word.

📗 Answer: .

📗 [4 points] Suppose the vocabulary is "", "", "", and the training data is "". Write down the transition matrix. Make sure that the sum of the transition probabilities in each row is 1.

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .

📗 [3 points] Suppose the vocabulary is the alphabet plus space (26 letters + 1 space character), what is the (maximum likelihood) estimated trigram probability \(\hat{\mathbb{P}}\left\{a | x, y\right\}\) with Laplace smoothing (add-1 smoothing) if the sequence \(x, y\) never appeared in the training set. The training set has tokens in total. Enter -1 if more information is required to estimate this probability.

📗 Answer: .

📗 [3 points] A tweet is ratioed if at least one reply gets more likes than the tweet. Suppose a tweet has replies, and each one of these replies gets more likes than the tweet with probability if the tweet is bad, and probability if the tweet is good. Given a tweet is ratioed, what is the probability that it is a bad tweet? The prior probability of a bad tweet is .

📗 Answer: .

📗 [3 points] Suppose the cumulative distribution function (CDF) of a discrete random variable \(X \in \left\{0, 1, 2, ...\right\}\) is given in the following table. What is the probability that is observed.

\(\mathbb{P}\left\{X < 0\right\}\)	\(\mathbb{P}\left\{X \leq 0\right\}\)	\(\mathbb{P}\left\{X \leq 1\right\}\)	\(\mathbb{P}\left\{X \leq 2\right\}\)	\(\mathbb{P}\left\{X \leq 3\right\}\)	\(\mathbb{P}\left\{X \leq 4\right\}\)
\(0\)

📗 Answer: .

📗 [3 points] Given an infinite state sequence where the pattern "" is repeated infinite number of times. What is the (maximum likelihood) estimated transition probability from state to (without smoothing)?

📗 Answer: .

📗 [3 points] There are two biased coins in my pocket: coin A has \(\mathbb{P}\left\{H | A\right\}\) = , coin B has \(\mathbb{P}\left\{H | B\right\}\) = . I took out a coin from the pocket at random with probability of A is . I flipped it twice the outcome is . What is the probability that the coin was ?

📗 Answer: .

📗 [2 points] \(C\) is the boolean whether you have COVID-19 or not. \(F\) is the boolean whether you have a fever or not. Let \(\mathbb{P}\left\{F = 1\right\}\) = , \(\mathbb{P}\left\{C = 1\right\}\) = , \(\mathbb{P}\left\{F = 0 | C = 1\right\}\) = . Given that you have COVID-19, what is the probability that you have fever? Note: this question uses random fake data, please refer to CDC for actual data.

📗 Answer: .

📗 [2 points] A traffic light repeats the following cycle: green seconds, yellow seconds, red seconds. A driver saw at a random moment. What is the probability that one second later the light became ?

📗 Answer: .

📗 [2 points] Let \(A \in\) and \(B \in\) . What is the least number of probabilities needed to fully specify the conditional probability table of B given A (\(\mathbb{P}\left\{B | A\right\}\))?

📗 Answer: .

📗 [2 points] You have a coin that lands heads with probability . Flipping it times and they all happen to be heads. What is the probability that the next flips will contain one or more tails?

📗 Answer: .

📗 [2 points] In your day vacation, the counts of days are:

rainy	warm	bighorn (saw sheep)	days
N	N	N
N	N	Y
N	Y	N
N	Y	Y
Y	N	N
Y	N	Y
Y	Y	N
Y	Y	Y

Use maximum likelihood estimate (no smoothing), estimate the probability that P(bighorn = | rainy = , warm = )?

📗 Answer: .

📗 [2 points] An n-gram language model computes the probability \(\mathbb{P}\left\{w_{n} | w_{1}, w_{2}, ..., w_{n-1}\right\}\). How many parameters need to be estimated for a -gram language model given a vocabulary size of ?

📗 Answer: .

📗 [4 points] Some Na'vi's don't wear underwear, but they are too embarrassed to admit that. A surveyor wants to estimate that fraction and comes up with the following less-embarrassing scheme: Upon being asked "do you wear your underwear", a Na'vi would flip a fair coin outside the sight of the surveyor. If the coin ends up head, the Na'vi agrees to say "Yes"; otherwise the Na'vi agrees to answer the question truthfully. On a very large population, the surveyor hears the answer "Yes" for fraction of the population. What is the estimated fraction of Na'vi's that don't wear underwear? Enter a fraction like 0.01 instead of a percentage 1%.

📗 Answer: .

📗 [4 points] If \(\mathbb{P}\left\{A | B\right\}\) is times the value of \(\mathbb{P}\left\{B | A\right\}\), and \(\mathbb{P}\left\{A\right\}\) = . What is \(\mathbb{P}\left\{B\right\}\)?

📗 Answer: .

📗 [4 points] Fill in the missing values in the following joint probability table so that A and B are independent.

-	A = 0	A = 1
B = 0
B = 1	??	??

📗 Answer (comma separated vector): .

📗 [4 points] John tells his professor that he forgot to submit his homework assignment. From experience, the professor knows that students who finish their homework on time forget to turn it in with probability . She also knows that of the students who have not finished their homework will tell her they forgot to turn it in. She thinks that of the students in this class completed their homework on time. What is the probability that John is telling the truth (i.e. he finished it given that he forgot to submit it)?

📗 Answer: .

📗 [3 points] Given two Boolean random variables, \(A\) and \(B\), where \(\mathbb{P}\left\{A\right\}\) = , \(\mathbb{P}\left\{B\right\}\) = , and \(\mathbb{P}\left\{A| \neg B\right\}\) = , what is \(\mathbb{P}\left\{A|B\right\}\)?

📗 Answer: .

📗 [3 points] Which of the following values of \(\mathbb{P}\left\{B\right\}\) is possible if \(\mathbb{P}\left\{A\right\} = \mathbb{P}\left\{A, B\right\}\) = ?

📗 Choices:

None of the above

📗 Calculator: .

📗 [3 points] Assume the prior probability of having a female child (girl) is the same as having a male child (boy) and both are 0.5. The Smith family has kids. One day you saw one of the Smith children, and she is a girl. The Wood family has kids, too, and you heard that at least one of them is a girl. What is the chance that the Smith family has a boy? What is the chance that the Wood family has a boy?

📗 Answer (comma separated vector): .

📗 [3 points] You have a joint probability table over \(k\) = random variables \(X_{1}, X_{2}, ..., X_{k}\), where each variable takes \(m\) = possible values: \(1, 2, ..., m\). To compute the probability that \(X_{1}\) = , how many cells in the table do you need to access (at most)?

📗 Answer: .

📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .

📗 Answer: .

📗 [3 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = C = 1\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = 0, C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = 1, C = 0\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = C = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?

📗 Answer: .

📗 [2 points] Given the following network \(A \to B \to C\) where A can take on values, B can take on values, C can take on values. Write down the minimum number of conditional probabilities that define the CPTs (Conditional Probability Table).

📗 Answer: .

📗 [3 points] You roll a 6-sided die times and observe the following counts in the table. Use Laplace smoothing (i.e. add-1 smoothing), estimate the probability of each side. Enter 6 numbers between 0 and 1, comma separated.

Side	1	2	3	4	5	6
Count

📗 Answer (comma separated vector): .

📗 [4 points] Say we use Naive Bayes in an application where there are features represented by variables, each having possible values, and there are classes. How many probabilities must be stored in the CPTs (Conditional Probability Table) in the Bayesian network for this problem? Do not include probabilities that can be computed from other probabilities.

📗 Answer: .

📗 [4 points] Consider the problem of detecting if an email message contains a virus. Say we use four random variables to model this problem: Boolean (binary) class variable \(V\) indicates if the message contains a virus or not, and three Boolean feature variables: \(A, B, C\). We decide to use a Naive Bayes Classifier to solve this problem so we create a Bayesian network with arcs from \(V\) to each of \(A, B, C\). Their associated CPTs (Conditional Probability Table) are created from the following data: \(\mathbb{P}\left\{V = 1\right\}\) = , \(\mathbb{P}\left\{A = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{A = 1 | V = 0\right\}\) = , \(\mathbb{P}\left\{B = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{B = 1 | V = 0\right\}\) = , \(\mathbb{P}\left\{C = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{C = 1 | V = 0\right\}\) = . Compute \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }.

📗 Answer: .

📗 [4 points] Consider the following Bayesian Network containing 5 Boolean random variables. How many numbers must be stored in total in all CPTs (Conditional Probability Table) associated with this network (excluding the numbers that can be calculated from other numbers)?

📗 Answer: .

📗 [0 points] To be added.

📗 [3 points] Suppose the likelihood probabilities of observing "a", "o", "c" in a real movie script is , and the likelihood probabilities of observing "a", "o", "c" in a fake movie script is . Given the prior probabilities, of the scripts are real. How would a Naive Bayes classifier classify a script ""? Enter \(1\) if it is classified as real, enter \(-1\) if it is classified as fake, and enter \(0\) if it's a tie (equally likely to be real and fake).

📗 Answer: .

📗 [3 points] Suppose ( + + ) entries are stored in conditional probability tables of three binary variables \(X_{1}, X_{2}, X_{3}\)? What is the configuration of Bayesian network? Enter 1 for causal chain (e.g. \(X_{1} \to X_{2} \to X_{3}\)), enter 2 for common cause (e.g. \(X_{1} \leftarrow X_{2} \to X_{3}\)) and enter 3 for common effect (e.g. \(X_{1} \to X_{2} \leftarrow X_{3}\)), and enter -1 if more information is needed or more than one of the previous configurations are possible.

📗 Answer: .

📗 [3 points] If the joint probabilities of the Bayesian network \(X_{1} \to X_{2} \to X_{3} \to ... \to X_{n}\) with \(n\) = binary variables are stored in a table (instead of the conditional probability tables (CPT)), what is the size of the table?

📗 For example, if the network is \(X_{1} \to X_{2}\), then the size of the joint probability table is 3, containing entries \(\mathbb{P}\left\{X_{1}, X_{2}\right\}, \mathbb{P}\left\{X_{1}, \neg X_{2}\right\}, \mathbb{P}\left\{\neg X_{1}, X_{2}\right\}\), because the joint probability \(\mathbb{P}\left\{\neg X_{1}, \neg X_{2}\right\} = 1 - \mathbb{P}\left\{X_{1}, X_{2}\right\} - \mathbb{P}\left\{X_{1}, \neg X_{2}\right\} - \mathbb{P}\left\{\neg X_{1}, X_{2}\right\}\) can be computed based on the other entries in the table.

📗 Answer: .

📗 [2 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\) with the following training set.

A	B	C
0		0
0		0
0		1
0		1
1		0
1		0
1		1
1		1

What is the MLE (Maximum Likelihood Estimate) with Laplace smoothing of the conditional probability that \(\mathbb{P}\){ \(B\) = | \(A\) = , \(C\) = }?

📗 Answer: .

📗 [3 points] Given a Bayesian network \(A \to B \to C \to D \to E\) of 5 binary event variables with the following conditional probability table (CPT), what is the probability that none of the events happen, \(\mathbb{P}\left\{\neg A, \neg B, \neg C, \neg D, \neg E\right\}\)?

\(\mathbb{P}\left\{A\right\}\) =	\(\mathbb{P}\left\{B \| A\right\}\) =	\(\mathbb{P}\left\{C \| B\right\}\) =	\(\mathbb{P}\left\{D \| C\right\}\) =	\(\mathbb{P}\left\{E \| D\right\}\) =
\(\mathbb{P}\left\{\neg A\right\}\) =	\(\mathbb{P}\left\{B \| \neg A\right\}\) =	\(\mathbb{P}\left\{C \| \neg B\right\}\) =	\(\mathbb{P}\left\{D \| \neg C\right\}\) =	\(\mathbb{P}\left\{E \| \neg D\right\}\) =

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

📗 [1 points] Blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 3" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:48 AM

\(\mathbb{P}\left\{A\right\}\) =	\(\mathbb{P}\left\{B \| A\right\}\) =	\(\mathbb{P}\left\{C \| B\right\}\) =	\(\mathbb{P}\left\{D \| C\right\}\) =	\(\mathbb{P}\left\{E \| D\right\}\) =
\(\mathbb{P}\left\{\neg A\right\}\) =	\(\mathbb{P}\left\{B \| \neg A\right\}\) =	\(\mathbb{P}\left\{C \| \neg B\right\}\) =	\(\mathbb{P}\left\{D \| \neg C\right\}\) =	\(\mathbb{P}\left\{E \| \neg D\right\}\) =