Prev: M4 Next: M6
Back to week 2 page: Link

# M5 Written (Math) Problems

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)
📗 The official deadline is July 4, but you can submit or resubmit without penalty until July 18.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 Please report any bugs on Piazza.

# Warning: please enter your ID before you start!


# Question 1



# Question 2



# Question 3



# Question 4



# Question 5



# Question 6



# Question 7



# Question 8



# Question 9



# Question 10



📗 [3 points] What is the city-block distance (also known as L1 distance or Manhattan distance) between two points and ?

📗 Note: the Manhattan distance is the sum of the lengths of the red lines, not the length of the blue line: that is the L2 or Euclidean distance.
Hint See Fall 2014 Midterm Q6. The Manhattan distance is the sum \(\displaystyle\sum_{j=1}^{m} \left| x_{1 j} - x_{2 j} \right|^{l}\) with \(l = 1\).
📗 Answer: .
📗 [3 points] Consider binary classification in 2D where the intended label of a point \(x = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) is positive (1) if \(x_{1} > x_{2}\) and negative (0) otherwise. Let the training set be all points of the form \(x\) = where \(a, b\) are integers. Each training item has the correct label that follows the rule above. With a 1NN (Nearest Neighbor) classifier (Euclidean distance), which ones of the following points are labeled positive? The drawing is not graded.


Hint See Fall 2013 Final Q5, Fall 2011 Midterm Q3. If multiple instances have the same distance to the new point, use the instances with larger x values. This question should not be solved by finding the nearest training instance for each choice: you should draw the decision boundary and check which side the points are on.
📗 Choices:





None of the above
📗 Calculator: .
📗 [3 points] Consider points in 2D and binary labels. Given the training data in the table, and use Manhattan distance with 1NN (Nearest Neighbor), which of the following points in 2D are classified as 1? Answer the question by first drawing the decision boundaries. The drawing is not graded.
index \(x_{1}\) \(x_{2}\) label
1 -1 -1
2 -1 1
3 1 -1
4 1 1



Hint See Spring 2018 Question 7, Fall 2014 Midterm Q2, Fall 2012 Final Q4. As discussed in the lectures, if multiple instances have the same distance to the new point, use the ones with smaller indices. This question should not be solved by finding the nearest training instance for each choice: you should draw the decision boundary and check which side the points are on.
📗 Choices:





None of the above
📗 [3 points] Consider a training set with 8 items. The first dimension of their feature vectors are: . However, this dimension is continuous (i.e. it is a real number). To build a decision tree, one may ask questions in the form "Is \(x_{1} \geq \theta\)"? where \(\theta\) is a threshold value. Ideally, what is the maximum number of different \(\theta\) values we should consider for the first dimension \(x_{1}\)? Count the values of \(\theta\) such that all instances belong to one class. 

Hint See Fall 2016 Final Q11. At most one threshold between two consecutive distinct values is needed, for example, if the possible values are \([-1, 0, 1]\), at most one threshold less than \(-1\), one threshold between \(-1\) and \(0\), one threshold between \(0\) and \(1\), and one threshold larger than \(0\) are needed (four in total).
📗 Answer: .
📗 [3 points] A decision tree has depth \(d\) = (a decision tree where the root is a leaf node has \(d\) = 0). All its internal node have \(b\) = children. The tree is also complete, meaning all leaf nodes are at depth \(d\). If we require each leaf node to contain at least training examples, what is the minimum size of the training set?
Hint See Fall 2014 Midterm Q9, Fall 2012 Final Q6. The total number of leaf nodes in a complete tree is \(b^{d}\), and if at least \(n\) training examples are needed in each one of them, since the same training example cannot appear in multiple subtrees, there should be at least \(n b^{d}\) training examples in total.
📗 Answer: .
📗 [3 points] A bag contains \(n\) = different colored balls. Randomly draw a ball from the bag with equal probability. What is the entropy of the outcome? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).
Hint See Fall 2014 Midterm Q10. The entropy formula is \(H = -\displaystyle\sum_{i=1}^{n} p_{i} \log_{2}\left(p_{i}\right)\). Here, since the probability of drawing each of the \(n\) balls is the same, \(p_{i} = \dfrac{1}{n}\) for each \(i\).
📗 Answer: .
📗 [3 points] Statistically, December 18 is the cloudiest day of the year in Madison, Wisconsin. Your professor (not me, this is Professor Jerry Zhu's question) is not making this up. On that day, the sky is overcast, mostly cloudy, or partly cloudy of the time (C = 0), and clear or mostly clear of the time (C = 1). What is the entropy of the binary random variable C? Reminder that log based 2 of x can be found by log(x) / log(2).
Hint See Fall 2014 Midterm Q10, Fall 2006 Final Q11, Fall 2005 Final Q11. The entropy formula is \(H = -p_{1} \log_{2}\left(p_{1}\right) - p_{2} \log_{2}\left(p_{2}\right)\).
📗 Answer: .
📗 [3 points] The RDA Corporation has a prison with many cells. Without justification, you're about to be randomly thrown into a cell with equal probability. Cells to have Toruks that eat prisoners. Cells to are safe. With sufficient bribe, the warden will answer your question "Will I be in cell 1?" What's the mutual information (we call it information gain) between the warden's answer and your encounter with the Toruks? (I didn't write the stories in these questions, so I don't know the reference too.)
Hint See Fall 2012 Final Q5, Fall 2011 Midterm Q5. Compute the information gain based on entropy of Toruks (call it \(Y\) where \(Y = 1\) is the event that there is a Toruk in the cell) and conditional entropy of Toruks given whether you are in cell 1 (call it \(Y | X\) where \(X = 1\) is the event that you are in cell 1). Then the information gain is \(I = H\left(Y\right) - H\left(Y | X\right)\), where \(H\left(Y\right) = \mathbb{P}\left\{Y = 0\right\} \log_{2}\left(\mathbb{P}\left\{Y = 0\right\}\right) + \mathbb{P}\left\{Y = 1\right\} \log_{2}\left(\mathbb{P}\left\{Y = 1\right\}\right)\) and \(H\left(Y | X\right) = \mathbb{P}\left\{X = 0\right\} H\left(Y | X = 0\right) + \mathbb{P}\left\{X = 1\right\} H\left(Y | X = 1\right)\) where \(H\left(Y | X = 0\right) = \mathbb{P}\left\{Y = 0 | X = 0\right\} \log_{2}\left(\mathbb{P}\left\{Y = 0 | X = 0\right\}\right) + \mathbb{P}\left\{Y = 1 | X = 0\right\} \log_{2}\left(\mathbb{P}\left\{Y = 1 | X = 0\right\}\right)\) and \(H\left(Y | X = 1\right) = \mathbb{P}\left\{Y = 0 | X = 1\right\} \log_{2}\left(\mathbb{P}\left\{Y = 0 | X = 1\right\}\right) + \mathbb{P}\left\{Y = 1 | X = 1\right\} \log_{2}\left(\mathbb{P}\left\{Y = 1 | X = 1\right\}\right)\). Here, \(\mathbb{P}\left\{Y = 1\right\}\) is the probability that there is a Toruk (i.e. the number of Toruks divided by the number of cells), \(\mathbb{P}\left\{X = 1\right\}\) is the probability that you are in cell 1 (i.e. 1 divided by the number of cells), \(\mathbb{P}\left\{Y = 1 | X = 1\right\}\) is the probability that there is a Toruk given you are in cell 1 (which is always 1), and \(\mathbb{P}\left\{Y = 1 | X = 0\right\}\) is the probability that there is a Toruk given you are not in cell 1 (i.e. the number of Toruks not in cell 1 divided by the number of cells that are not cell 1).
📗 Answer: .
📗 [4 points] You are given a training set of five points and their 2-class classifications (+ or -): (, +), (, +), (, -), (, -), (, -). What is the decision boundary associated with this training set using 3NN (3 Nearest Neighbor)?
Hint See Spring 2017 Midterm Q6. The decision boundary is the threshold such that all points on its left is classified as positive, and all points on its right is classified as negative. The threshold should be equidistant from the first and fourth points (i.e. the midpoint between the first and fourth points).
📗 Answer: .
📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the questions that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.
📗 Answer: .

# Grade


 ***** ***** ***** ***** ***** 

 ***** ***** ***** ***** *****

# Submission


📗 Please do not modify the content in the above text field: use the "Grade" button to update.


📗 Please wait for the message "Successful submission." to appear after the "Submit" button. If there is an error message or no message appears after 10 seconds, please save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M5. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 5" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.



# Solutions

📗 Some of the past exams referenced in the Hints can be found on Professor Zhu's and Professor Dyer's websites: Link and Link.
📗 Some of the questions are from last year, and I recorded videos going through them, the links are at the bottom of the Week 1 to Week 8 pages, for example: W4 and W8.
📗 The links to the solutions the students volunteered to share on Piazza will be collected in this post around the official deadline: Link.





Last Updated: April 29, 2024 at 1:11 AM