Young Wu's Homepage

Prev: M6 Next: M8
Back to week 3 page: Link

# M7 Written (Math) Problems

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)

📗 The official deadline is July 11, but you can submit or resubmit without penalty until July 18.

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Please do not refresh the page: your answers will not be saved.

📗 Please report any bugs on Piazza.

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .

Hint

See Fall 2016 Final Q18, Fall 2011 Midterm Q20. Use the Bayes rule: \(\mathbb{P}\left\{Y = y | X_{1} = x_{1}, X_{2} = x_{2}\right\} = \dfrac{\mathbb{P}\left\{X_{1} = x_{1}, X_{2} = x_{2} | Y = y\right\} \mathbb{P}\left\{Y = y\right\}}{\displaystyle\sum_{y'=1}^{n} \mathbb{P}\left\{X_{1} = x_{1}, X_{2} = x_{2} | Y = y'\right\} \mathbb{P}\left\{Y = y'\right\}}\), which is equal to \(\dfrac{\mathbb{P}\left\{X_{1} = x_{1} | Y = y\right\} \mathbb{P}\left\{X_{2} = x_{2} | Y = y\right\} \mathbb{P}\left\{Y = y\right\}}{\displaystyle\sum_{y'=1}^{n} \mathbb{P}\left\{X_{1} = x_{1} | Y = y'\right\} \mathbb{P}\left\{X_{2} = x_{2} | Y = y'\right\} \mathbb{P}\left\{Y = y'\right\}}\), due to the independence assumption of Naive Bayes. For Bayesian network that are not Naive, the second equality is not true. Naive Bayes classifier selects the \(y\) that maximizes \(\mathbb{P}\left\{Y = y | X_{1} = x_{1}, X_{2} = x_{2}\right\}\): since the denominators for these probabilities are the same, and the prior probability is constant, the classifier is effectively selecting the \(y\) that maximizes \(\mathbb{P}\left\{X_{1} = x_{1} | Y = y\right\} \mathbb{P}\left\{X_{2} = x_{2} | Y = y\right\}\) which is a function in \(y\). You can try different values of \(y\) to find the maximizer or use the first derivative condition if the number of classes is large (i.e. compare the integers near the places where the first derivative is zero and the end points).

📗 Answer: .

📗 [3 points] Consider the following directed graphical model over binary variables: \(A \to B \leftarrow C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = C = 1\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = 0, C = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = 1, C = 0\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = C = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }?

Hint

See Fall 2019 Final Q22 Q23 Q24 Q25, Spring 2018 Final Q24 Q25, Fall 2014 Final Q9, Fall 2006 Final Q20, Fall 2005 Final Q20. For any Bayes net, the joint probability can always be computed as the product of the conditional probabilities (conditioned on the parent node variable). For a causal chain \(A \to B \to C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a\right\} \mathbb{P}\left\{B = b | A = a\right\} \mathbb{P}\left\{C = c | B = b\right\}\). For a common cause \(A \leftarrow B \to C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a | B = b\right\} \mathbb{P}\left\{B = b\right\} \mathbb{P}\left\{C = c | B = b\right\}\). For a common effect \(A \to B \leftarrow C\), \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a\right\} \mathbb{P}\left\{B = b | A = a, C = c\right\} \mathbb{P}\left\{C = c\right\}\).

📗 Answer: .

📗 [5 points] Consider the following directed graphical model over binary variables: \(A \to B \to C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{A = 1\right\}\)
\(\mathbb{P}\left\{B = 1 \| A = 1\right\}\)		\(\mathbb{P}\left\{B = 1 \| A = 0\right\}\)
\(\mathbb{P}\left\{C = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{C = 1 \| B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = \(|\) \(C\) = }?

Hint

See Fall 2019 Final Q22 Q23 Q24 Q25, Spring 2018 Final Q24 Q25, Fall 2014 Final Q9, Fall 2006 Final Q20, Fall 2005 Final Q20. For any type of network, one way (brute force, not really efficient) is to use the marginal distributions: \(\mathbb{P}\left\{A = a | C = c\right\} = \dfrac{\mathbb{P}\left\{A = a, C = c\right\}}{\mathbb{P}\left\{C = c\right\}} = \dfrac{\displaystyle\sum_{b'} \mathbb{P}\left\{A = a, B = b', C = c\right\}}{\displaystyle\sum_{a', b'} \mathbb{P}\left\{A = a', B = b', C = c\right\}}\). The joint probabilities can be calculated the same way as the previous question.

📗 Answer: .

📗 [5 points] Consider the following directed graphical model over binary variables: \(A \leftarrow B \to C\). Given the CPTs (Conditional Probability Table):

Variable	Probability	Variable	Probability
\(\mathbb{P}\left\{B = 1\right\}\)
\(\mathbb{P}\left\{C = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{C = 1 \| B = 0\right\}\)
\(\mathbb{P}\left\{A = 1 \| B = 1\right\}\)		\(\mathbb{P}\left\{A = 1 \| B = 0\right\}\)

What is the probability that \(\mathbb{P}\){ \(A\) = \(|\) \(C\) = }?

Hint

📗 Answer: .

📗 [4 points] Consider the following Bayesian Network containing 5 Boolean random variables. How many numbers must be stored in total in all CPTs (Conditional Probability Table) associated with this network (excluding the numbers that can be calculated from other numbers)?

Hint

See Fall 2019 Final Q21, Spring 2017 Final Q8, Fall 2011 Midterm Q15. A node with one parent \(\mathbb{P}\left\{B | A\right\}\) requires storing \(\left(n_{B} - 1\right) n_{A}\) probabilities; a node with no parent \(\mathbb{P}\left\{A\right\}\) requires storing \(n_{A} - 1\) probabilities, and a node with two parents \(\mathbb{P}\left\{C | A, B\right\}\) requires storing \(\left(n_{C} - 1\right) n_{A} n_{B}\) probabilities, ...

📗 Answer: .

📗 [3 points] You roll a 6-sided die times and observe the following counts in the table. Use Laplace smoothing (i.e. add-1 smoothing), estimate the probability of each side. Enter 6 numbers between 0 and 1, comma separated.

Side	1	2	3	4	5	6
Count

Hint

See Spring 2018 Final Q21, Fall 2016 Final Q4, Fall 2011 Midterm Q16. The maximum likelihood estimate of \(\mathbb{P}\left\{A = i\right\}\) is \(\dfrac{n_{i} + \delta}{\displaystyle\sum_{i'=1}^{6} n_{i'} + 6 \cdot \delta}\).

📗 Answer (comma separated vector): .

📗 [2 points] We have a biased coin with probability of producing Heads. We create a predictor as follows: generate a random number uniformly distributed in (0, 1). If the random number is less than we predict Heads, otherwise, we predict Tails. What is this predictor's (expected) accuracy in predicting the coin's outcome?

Hint

See Fall 2010 Final Q19. Suppose the probability of Heads is \(p\) and the probability of predicting Heads is \(q\), then the probability that the prediction is correctly Heads is \(p q\) and the probability that the prediction is correctly Tails is \(\left(1 - p\right)\left(1 - q\right)\). The accuracy is the sum of these two cases. By the way, the probability that a Head is predicted as Tail is \(p \left(1 - q\right)\) and the probability that a Tail is predicted as Head is \(q \left(1 - p\right)\). The sum of these four probabilities should be \(1\).

📗 Answer: .

📗 [4 points] Consider the problem of detecting if an email message contains a virus. Say we use four random variables to model this problem: Boolean (binary) class variable \(V\) indicates if the message contains a virus or not, and three Boolean feature variables: \(A, B, C\). We decide to use a Naive Bayes Classifier to solve this problem so we create a Bayesian network with arcs from \(V\) to each of \(A, B, C\). Their associated CPTs (Conditional Probability Table) are created from the following data: \(\mathbb{P}\left\{V = 1\right\}\) = , \(\mathbb{P}\left\{A = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{A = 1 | V = 0\right\}\) = , \(\mathbb{P}\left\{B = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{B = 1 | V = 0\right\}\) = , \(\mathbb{P}\left\{C = 1 | V = 1\right\}\) = , \(\mathbb{P}\left\{C = 1 | V = 0\right\}\) = . Compute \(\mathbb{P}\){ \(A\) = , \(B\) = , \(C\) = }.

Hint

See Spring 2017 Final Q7. Naive Bayes is a special simple Bayesian Network, so the way to compute the joint probabilities is the same (product of conditional probabilities given the parents): \(\mathbb{P}\left\{A = a, B = b, C = c\right\} = \mathbb{P}\left\{A = a, B = b, C = c, V = 0\right\} + \mathbb{P}\left\{A = a, B = b, C = c, V = 1\right\}\) and \(\mathbb{P}\left\{A = a, B = b, C = c, V = v\right\} = \mathbb{P}\left\{A = a | V = v\right\} \mathbb{P}\left\{B = b | V = v\right\} \mathbb{P}\left\{C = c | V = v\right\} \mathbb{P}\left\{V = v\right\}\).

📗 Answer: .

📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the questions that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

# Submission

📗 Please do not modify the content in the above text field: use the "Grade" button to update.

📗 Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M7. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.

📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 7" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.

# Solutions

📗 Some of the past exams referenced in the Hints can be found on Professor Zhu's and Professor Dyer's websites: Link and Link.

📗 Some of the questions are from last year, and I recorded videos going through them, the links are at the bottom of the Week 1 to Week 8 pages, for example: W4 and W8.

📗 The links to the solutions the students volunteered to share on Piazza will be collected in this post around the official deadline: Link.

Last Updated: July 01, 2025 at 1:47 AM