Young Wu's Homepage

Prev: M5 Next: M7
Back to week 3 page: Link

# Warning: this is a replica of the homework page for testing purposes, please use M6 for homework submission.

# M6 Written (Math) Problems

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 You can also load from your saved file
and click .

📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

📗 [4 points] John tells his professor that he forgot to submit his homework assignment. From experience, the professor knows that students who finish their homework on time forget to turn it in with probability . She also knows that of the students who have not finished their homework will tell her they forgot to turn it in. She thinks that of the students in this class completed their homework on time. What is the probability that John is telling the truth (i.e. he finished it given that he forgot to submit it)?

Hint

See Fall 2019 Final Q18 Q19, Fall 2017 Final Q6. Let \(C\) represent finishing (completing) homework and \(F\) represent forgetting to turn it. Then the question is asking \(\mathbb{P}\left\{C | F\right\} = \dfrac{\mathbb{P}\left\{C, F\right\}}{\mathbb{P}\left\{F\right\}} = \dfrac{\mathbb{P}\left\{F | C\right\} \mathbb{P}\left\{C\right\}}{\mathbb{P}\left\{F | C\right\} \mathbb{P}\left\{C\right\} + \mathbb{P}\left\{F | \neg C\right\} \left(1 - \mathbb{P}\left\{C\right\}\right)}\) due to the law of total probabilities.

📗 Answer: .

📗 [4 points] Fill in the missing values in the following joint probability table so that A and B are independent.

-	A = 0	A = 1
B = 0
B = 1	??	??

Hint

See Fall 2019 Final Q20, Fall 2013 Final Q15, Fall 2011 Final Q4, Fall 2010 Final Q11.

📗 Answer (comma separated vector): .

📗 [3 points] There are two biased coins in my pocket: coin A has \(\mathbb{P}\left\{H | A\right\}\) = , coin B has \(\mathbb{P}\left\{H | B\right\}\) = . I took out a coin from the pocket at random with probability of A is . I flipped it twice the outcome is . What is the probability that the coin was ?

Hint

See Spring 2018 Final Q22 Q23, Fall 2018 Midterm Q11, Fall 2017 Final Q20, Spring 2017 Final Q6, Fall 2010 Final Q18. For example, the Bayes Rule for the probability that the document is \(A\) given the outcome is \(H T H\) is \(\mathbb{P}\left\{A | H T H\right\} = \dfrac{\mathbb{P}\left\{H T H, A\right\}}{\mathbb{P}\left\{H T H\right\}}\) = \(\dfrac{\mathbb{P}\left\{H T H | A\right\} \mathbb{P}\left\{A\right\}}{\mathbb{P}\left\{H T H | A\right\} \mathbb{P}\left\{A\right\} + \mathbb{P}\left\{H T H | B\right\} \mathbb{P}\left\{B\right\}}\) = \(\dfrac{\mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{T | A\right\} \mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{A\right\}}{\mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{T | A\right\} \mathbb{P}\left\{H | A\right\} \mathbb{P}\left\{A\right\} + \mathbb{P}\left\{H | B\right\} \mathbb{P}\left\{T | B\right\} \mathbb{P}\left\{H | B\right\} \mathbb{P}\left\{B\right\}}\). Note that \(\mathbb{P}\left\{H T H | A\right\}\) can be split into three probabilities because the coins are independently flipped.

📗 Answer: .

📗 [3 points] You roll a 6-sided die times and observe the following counts in the table. Use Laplace smoothing (i.e. add-1 smoothing), estimate the probability of each side. Enter 6 numbers between 0 and 1, comma separated.

Side	1	2	3	4	5	6
Count

Hint

See Spring 2018 Final Q21, Fall 2016 Final Q4, Fall 2011 Midterm Q16. The maximum likelihood estimate of \(\mathbb{P}\left\{A = i\right\}\) is \(\dfrac{n_{i} + \delta}{\displaystyle\sum_{i'=1}^{6} n_{i'} + 6 \cdot \delta}\).

📗 Answer (comma separated vector): .

📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .

Hint

See Fall 2016 Final Q18, Fall 2011 Midterm Q20. Use the Bayes rule: \(\mathbb{P}\left\{Y = y | X_{1} = x_{1}, X_{2} = x_{2}\right\} = \dfrac{\mathbb{P}\left\{X_{1} = x_{1}, X_{2} = x_{2} | Y = y\right\} \mathbb{P}\left\{Y = y\right\}}{\displaystyle\sum_{y'=1}^{n} \mathbb{P}\left\{X_{1} = x_{1}, X_{2} = x_{2} | Y = y'\right\} \mathbb{P}\left\{Y = y'\right\}}\), which is equal to \(\dfrac{\mathbb{P}\left\{X_{1} = x_{1} | Y = y\right\} \mathbb{P}\left\{X_{2} = x_{2} | Y = y\right\} \mathbb{P}\left\{Y = y\right\}}{\displaystyle\sum_{y'=1}^{n} \mathbb{P}\left\{X_{1} = x_{1} | Y = y'\right\} \mathbb{P}\left\{X_{2} = x_{2} | Y = y'\right\} \mathbb{P}\left\{Y = y'\right\}}\), due to the independence assumption of Naive Bayes. For Bayesian network that are not Naive, the second equality is not true. Naive Bayes classifier selects the \(y\) that maximizes \(\mathbb{P}\left\{Y = y | X_{1} = x_{1}, X_{2} = x_{2}\right\}\): since the denominators for these probabilities are the same, and the prior probability is constant, the classifier is effectively selecting the \(y\) that maximizes \(\mathbb{P}\left\{X_{1} = x_{1} | Y = y\right\} \mathbb{P}\left\{X_{2} = x_{2} | Y = y\right\}\) which is a function in \(y\). You can try different values of \(y\) to find the maximizer or use the first derivative condition if the number of classes is large (i.e. compare the integers near the places where the first derivative is zero and the end points).

📗 Answer: .

📗 [4 points] Consider the problem of detecting if an email message contains a virus. Say we use four random variables to model this problem: Boolean (binary) class variable \(V\) indicates if the message contains a virus or not, and three Boolean feature variables: \(A, B, C\). We decide to use a Naive Bayes Classifier to solve this problem so we create a Bayesian network with arcs from \(V\) to each of \(A, B, C\). Their associated CPTs (Conditional Probability Table) are created from the following data: \(\mathbb{P}\left\{V = 1\right\}\) = , \(\mathbb{P}\left\{A = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{A = 1 | V = 0\right\}\) = , \(\mathbb{P}\left\{B = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{B = 1 | V = 0\right\}\) = , \(\mathbb{P}\left\{C = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{C = 1 | V = 0\right\}\) = . Compute \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }.

Hint

See Spring 2017 Final Q7. Naive Bayes is a special simple Bayesian Network, so the way to compute the joint probabilities is the same (product of conditional probabilities given the parents): \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a, B = b, C = c, V = 0\right\} + \mathbb{P}\left\{A = a, B = b, C = c, V = 1\right\}\) and \(\mathbb{P}\left\{A = a, B = b, C = c, V = v\right\} = \mathbb{P}\left\{A = a | V = v\right\} \mathbb{P}\left\{B = b | V = v\right\} \mathbb{P}\left\{C = c | V = v\right\} \mathbb{P}\left\{V = v\right\}\).

📗 Answer: .

📗 [5 points] Consider the following directed graphical model over binary variables: \(A \to B \to C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = 1\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = 0\right\}\)
\(\mathbb{P}\left\{C = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{C = 1 \| B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = \(|\) \(C\) = }?

Hint

See Fall 2019 Final Q22 Q23 Q24 Q25, Spring 2018 Final Q24 Q25, Fall 2014 Final Q9, Fall 2006 Final Q20, Fall 2005 Final Q20. For any type of network, one way (brute force, not really efficient) is to use the marginal distributions: \(\mathbb{P}\left\{A = a | C = c\right\} = \dfrac{\mathbb{P}\left\{A = a, C = c\right\}}{\mathbb{P}\left\{C = c\right\}} = \dfrac{\displaystyle\sum_{b'} \mathbb{P}\left\{A = a, B = b', C = c\right\}}{\displaystyle\sum_{a', b'} \mathbb{P}\left\{A = a', B = b', C = c\right\}}\). The joint probabilities can be calculated the same way as the previous question.

📗 Answer: .

📗 [5 points] Consider the following directed graphical model over binary variables: \(A \leftarrow B \to C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{B = 1\right\}\)
\(\mathbb{P}\left\{C = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{C = 1 \| B = 0\right\}\)
\(\mathbb{P}\left\{A = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{A = 1 \| B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = \(|\) \(C\) = }?

Hint

📗 Answer: .

📗 [4 points] Consider the following Bayesian Network containing 5 Boolean random variables. How many numbers must be stored in total in all CPTs (Conditional Probability Table) associated with this network (excluding the numbers that can be calculated from other numbers)?

Hint

See Fall 2019 Final Q21, Spring 2017 Final Q8, Fall 2011 Midterm Q15. A node with one parent \(\mathbb{P}\left\{B | A\right\}\) requires storing \(\left(n_{B} - 1\right) n_{A}\) probabilities; a node with no parent \(\mathbb{P}\left\{A\right\}\) requires storing \(n_{A} - 1\) probabilities, and a node with two parents \(\mathbb{P}\left\{C | A, B\right\}\) requires storing \(\left(n_{C} - 1\right) n_{A} n_{B}\) probabilities, ...

📗 Answer: .

📗 [4 points] Given the following transition matrix for a bigram model with words "", "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?

Hint

See Fall 2019 Final Q30. Sum over all possible values of the second word: \(\mathbb{P}\left\{w_{3} = j | w_{1} = i\right\} = \displaystyle\sum_{k=1}^{3} \mathbb{P}\left\{w_{3} = j | w_{2} = k\right\} \mathbb{P}\left\{w_{2} = k | w_{1} = i\right\}\), where the \(\mathbb{P}\left\{w_{t} | w_{t-1}\right\}\) probabilities are given by the transition matrix.

📗 Answer: .

📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the questions that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 6" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

Last Updated: July 01, 2025 at 1:48 AM