# M1B Midterm Part 1 Version B

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 In case the questions are not generated correctly, try (1) refresh the page, (2) clear the browser cache, (3) switch to incognito/private browsing mode, (4) switch to another browser, (5) use a different ID. If none of these work, please post a private message on Piazza with your ID.
📗 Join Zoom if you have questions:
Zoom Link
📗 Please do not refresh the page (after you start): your answers will not be saved.

# ID: test


# Question 1

📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights w = [0.2] and bias b = −0.3 trained using the Perceptron Algorithm. Given a new input x = [1] and y = 0. Let the learning rate be α = 0.5, compute the updated weights, w′,b′ = [w1b]:
📗 Answer (comma separated vector): .

# Question 2

📗 [2 points] Consider a rectified linear unit (ReLU) with input x and a bias term. The output can be written as y = max(0,−3 x+2). Here, the weight is −3 and the bias is 2. Write down the smallest input value x that produces a specific output y = 0.

📗 The red curve is a plot of the activation function, given the y-value of the green point, the question is asking for its x-value.
📗 Answer: .

# Question 3

📗 [3 points] Statistically, cats are often hungry around 6:00 am (I am making this up). At that time, a cat is hungry 23 of the time (C = 1), and not hungry 13 of the time (C = 0). What is the entropy of the binary random variable C? Reminder that log based 2 of x can be found by log(x) / log(2) or log2(x).
📗 Answer: .

# Question 4

📗 [4 points] List English letters from A to Z: ABCDEFGHIJKLMNOPQRSTUVWXYZ. Define the distance between two letters in the natural way, that is d(A,A)=0, d(A,B)=1, d(A,C)=2 and so on. Each letter has a label, I, M, P, Q, T, U are labeled 0, and the others are labeled 1. This is your training data. Now classify each letter using kNN (k Nearest Neighbor) for odd k=1,3,5,7,.... What is the smallest k where all letters are classified the same (same label, i.e. either all labels are 0s or all labels are 1s). Break ties by preferring the earlier letters in the alphabet. Hint: the nearest neighbor of a letter is the letter itself.
📗 Answer: .

# Question 5

📗 [2 points] Given the training data "am am Groot am am", with the unigram model, what is the probability of observing the new sentence "Groot Groot Groot" given the first word is Groot? Use MLE (Maximum Likelihood Estimate) without smoothing and do not include the probability of observing the first word.
📗 Answer: .

# Question 6

📗 [3 points] There are two biased coins in my pocket: coin A has P{H|A} = 23, coin B has P{H|B} = 12. I took out a coin from the pocket at random with probability of A is 35. I flipped it three times (independently) and the outcome is HTT. What is the probability that the coin was B?
📗 Answer: .

# Question 7

📗 [2 points] In a corpus with 180 word tokens, the phrase "Fort Night" appeared 27 times (not Fortnite). In particular, "Fort" appeared 90 times and "Night" appeared 67. If we estimate probability by frequency (the maximum likelihood estimate) without smoothing, what is the estimated probability of P(Night | Fort)?
📗 Answer: .

# Question 8

📗 [4 points] Given the following training set, add one instance [x1x2] with y = 1 so that all instances are support vectors for the Hard Margin SVM (Support Vector Machine) trained on the new training set.
x1 x2 y
−7 2 0
−5 2 0
−3 2 0
9 −2 1
6 −2 1
10 −2 1


📗 Note: in the diagram, currently, the two support vectors are connected by the grey line and the black line represents the SVM classification boundary. After adding one point, you should be able to make all seven points support vectors with the classification boundary given by the green line.
📗 Answer (comma separated vector): .

# Question 9

📗 [2 points] Given the following image gradient, suppose gradient vectors are put into one of the four bins according to the gradient direction: bin 1: (0,π2], bin 2: (π2,π], bin 3: [−π2,−π), bin 4: [0,−π2), which bin does the gradient of the center element (pixel) fall into?
∇x = [2−36−303809], ∇y = [8−10−99002−3−6].
Enter the bin number (1, 2, 3, or 4), not the direction.
📗 Calculator (you can use the function atan2(y, x)): .
📗 Answer: .

# Question 10

📗 [4 points] Given the following transition matrix for a bigram model with words "I" (label 0), "am" (label 1) and "Groot" (label 2): [0.350.220.430.250.340.410.380.230.39]. Row i column j is P{wt=j|wt−1=i}. Two uniform random numbers between 0 and 1 are generated to simulate the words after "I", say u1 = 0.87 and u2 = 0.04. Using the CDF (Cumulativ Distribution Function) inversion method (inverse transform method), which two words are generated? Enter two integer labels (0, 1, or 2), not strings.
📗 Answer (comma separated vector): .

# Question 11

📗 [2 points] There is a total of 16 red or green balls in a bag. How many red balls and how many green balls are there so that the entropy of the color of a randomly selected ball is maximized?
📗 Answer (comma separated vector): .

# Question 12

📗 [4 points] Consider an unbiased estimator X for a parameters θ. We have E[X] = 2, Var[X] = 9, E[Y] = −3, Var[Y] = 5. We would like a modified estimator Z=X−Y to have a reduced variance compared to X. For what covariance Cov[X,Y] can we achieve Var[Z]≤Var[X]? Note: there are many possible answers, enter only one of them.
📗 Answer: .

# Question 13

📗 [3 points] Let a dataset consist of n = 6 points in R, specifically, the first n−1 points are [−30236] and the last point xn is unknown. What is the smallest value of xn above which xn−1 is among xn's 3-nearest neighbors, but xn is NOT among xn−1's 3-nearest neighbor? Note that the 3-nearest neighbors of a point in the training set include the point itself.
📗 Answer: .

# Question 14

📗 [3 points] Assume the prior probability of having a female child (girl) is the same as having a male child (boy) and both are 0.5. The Smith family has 4 kids. One day you saw one of the Smith children, and she is a girl. The Wood family has 4 kids, too, and you heard that at least one of them is a girl. What is the chance that the Smith family has a boy? What is the chance that the Wood family has a boy?
📗 Answer (comma separated vector): .

# Question 15

📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.
📗 Answer: .


# Grade


 * * * *

 * * * * *

# Submission


📗 Please do not modify the content in the above text field: use the "Grade" button to update.


📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment M1B. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 1B" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.







Last Updated: April 09, 2025 at 11:28 PM