📗 [2 points] The Perceptron algorithm does not terminate (cannot converge) for any learning rate on the following training set. Give an example of \(x_{1}\). If there are multiple possible answers, enter only one, and if no such \(x\) exists, enter \(-1\).
\(i\) \(x_{i}\) \(y_{i}\)
1 \(x_{1}\)

📗 Answer: .
📗 [4 points] Suppose the partial derivative of a cost function \(C\) with respect to some weight \(w\) is given by \(\dfrac{\partial C}{\partial w} = \displaystyle\sum_{i=1}^{n} \dfrac{\partial C_{i}}{\partial w} = \displaystyle\sum_{i=1}^{n} w x_{i}\). Given a data set \(x\) = {} and initial weight \(w\) = , compute and compare the updated weight after 1 step of batch gradient descent and steps of stochastic gradient descent (start with the same initial weight, then use data point 1 in step 1, data point 2 in step 2, ...). Enter two numbers, comma separated, batch gradient descent first. Use the learning rate \(\alpha\) = .
📗 Answer (comma separated vector): .
📗 [3 points] Suppose there are three classifiers \(f_{1}, f_{2}, f_{3}\) to choose from (i.e. the hypothesis space has three elements), and the activation values from these classifiers based on a training set of three items are listed below. Which classifier is the best one if loss is used for comparison? (Enter a number 1 or 2 or 3).
📗 Reminder: zero-one loss means \(\displaystyle\sum_{i=1}^{n} 1_{\left\{a_{i} \neq y_{i}\right\}}\), square loss means \(\displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\), cross entropy loss means \(\displaystyle\sum_{i=1}^{n} -y_{i} \log\left(a_{i}\right) + \left(1 - y_{i}\right) \log\left(1 - a_{i}\right)\).
Items 1 2 3

📗 Answer: .
📗 [4 points] Given the vocabulary \(a, b\) and the following probabilities, compute the bigram (Markov) transition matrix, row (column) 1 corresponding to \(a\) and row (column) 2 corresponding to \(b\). Note: in the table \(\mathbb{P}\left\{a b\right\}\) means the probability of \(b\) after \(a\).
\(\mathbb{P}\left\{a b\right\}\) \(\mathbb{P}\left\{a a\right\}\) \(\mathbb{P}\left\{b a\right\}\) \(\mathbb{P}\left\{b b\right\}\)

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights \(w\) = and bias \(b\) = trained using the Perceptron Algorithm. Given a new input \(x\) = and \(y\) = . Let the learning rate be \(\alpha\) = , compute the updated weights, \(w', b'\) = :
📗 Answer (comma separated vector): .
📗 [2 points] Consider a single sigmoid perceptron with bias weight \(w_{0}\) = , a single input \(x_{1}\) with weight \(w_{1}\) = , and the sigmoid activation function \(g\left(z\right) = \dfrac{1}{1 + \exp\left(-z\right)}\). For what input \(x_{1}\) does the perceptron output value \(a\) = .
📗 Note: Math.js does not accept "ln(...)", please use "log(...)" instead.
📗 Answer: .
📗 [3 points] Suppose you are given a neural network with hidden layers, input units, output units, and hidden units. In one backpropogation step when computing the gradient of the cost (for example, squared loss) with respect to \(w^{\left(1\right)}_{11}\), the weight in layer \(1\) connecting input \(1\) and hidden unit \(1\), how many weights (including \(w^{\left(1\right)}_{11}\) itself, and including biases) are used in the backpropogation step of \(\dfrac{\partial C}{\partial w^{\left(1\right)}_{11}}\)?
📗 Note: the backpropogation step assumes the activations in all layers are already known so do not count the weights and biases in the forward step computing the activations.
📗 Answer: .
📗 [4 points] If \(K\left(x, x'\right)\) is a kernel with induced feature representation \(\varphi\left(x_{0}\right)\) = , and \(G\left(x, x'\right)\) is another kernel with induced feature representation \(\theta\left(x_{0}\right)\) = , then it is known that \(H\left(x, x'\right) = a K\left(x, x'\right) + b G\left(x, x'\right)\), \(a\) = , \(b\) = is also a kernel. What is the induced feature representation of \(H\) for this \(x_{0}\)?
📗 Answer (comma separated vector): .
📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the hinge cost function . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one stochastic (sub)gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .
📗 Answer (comma separated vector): .
📗 [4 points] Consider a classification problem with \(n\) = classes \(y \in \left\{1, 2, ..., n\right\}\), and two binary features \(x_{1}, x_{2} \in \left\{0, 1\right\}\). Suppose \(\mathbb{P}\left\{Y = y\right\}\) = , \(\mathbb{P}\left\{X_{1} = 1 | Y = y\right\}\) = , \(\mathbb{P}\left\{X_{2} = 1 | Y = y\right\}\) = . Which class will naive Bayes classifier produce on a test item with \(X_{1}\) = and \(X_{2}\) = .
📗 Answer: .
📗 [4 points] You are given a training set of six points and their 2-class classifications (+ or -): (, +), (, +), (, +), (, -), (, -), (, -). What is the decision boundary associated with this training set using 3NN (3 Nearest Neighbor)? Note: there is one more point compared to the question from the homework.
📗 Answer: .
📗 [2 points] There is a total of red or green balls in a bag. How many red balls and how many green balls are there so that the entropy of the color of a randomly selected ball is imized?
📗 Answer (comma separated vector): .
📗 [4 points] Given the following transition matrix for a bigram model with words "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?
📗 Answer: .
📗 [4 points] John tells his professor that he forgot to submit his homework assignment. From experience, the professor knows that students who finish their homework on time forget to turn it in with probability . She also knows that of the students who have not finished their homework will tell her they forgot to turn it in. She thinks that of the students in this class completed their homework on time. What is the probability that John is telling the truth (i.e. he finished it given that he forgot to submit it)?
📗 Answer: .
📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.
📗 Answer: .

