Young Wu's Homepage

📗 [4 points] Consider a Linear Threshold Unit (LTU) perceptron with initial weights \(w\) = and bias \(b\) = trained using the Perceptron Algorithm. Given a new input \(x\) = and \(y\) = . Let the learning rate be \(\alpha\) = , compute the updated weights, \(w', b'\) = :

📗 Answer (comma separated vector): .

📗 [3 points] Let \(g\left(z\right) = \dfrac{1}{1 + \exp\left(-z\right)}, z = w^\top x = w_{1} x_{1} + w_{2} x_{2} + ... + w_{d} x_{d}\), \(d\) = be a sigmoid perceptron with inputs \(x_{1} = ... = x_{d}\) = and weights \(w_{1} = ... = w_{d}\) = . There is no bias term. If the desired output is \(y\) = , and the sigmoid perceptron update rule has a learning rate of \(\alpha\) = , what will happen after one step of update? Each \(w_{i}\) will change by (enter a number, positive for increase and negative for decrease).

📗 Answer: .

📗 [6 points] With a linear threshold unit perceptron, implement the following function. That is, you should write down the weights \(w_{0}, w_{A}, w_{B}\). Enter the bias first, then the weights on A and B.

A	B	function
0	0
0	1
1	0
1	1

📗 Answer (comma separated vector): .

📗 [3 points] What is the minimum number of training items that needs to be removed so that a Perceptron can learn the remaining training set (with accuracy 100 percent)?

\(x_{1}\)	\(x_{2}\)	\(x_{3}\)	\(y\)
0	0	0
0	0	1
0	1	0
0	1	1
1	0	0
1	0	1
1	1	0
1	1	1

📗 Answer: .

📗 [3 points] In one iteration of the Perceptron Algorithm, \(x\) = , \(y\) = , and predicted label \(\hat{y} = a\) = . The learning rate \(\alpha = 1\). After the iteration, how many of the weights (include bias \(b\)) are increased (the change is strictly larger than 0). If it is impossible to figure out given the information, enter -1.

📗 Answer: .

📗 [3 points] In one iteration of the Perceptron Algorithm, the initial weights are \(w\) = and \(b\) = , with \(x\) = , \(y \in \left\{0, 1\right\}\), and learning rate \(\alpha = 1\). After the iteration, the weights remain unchanged. What is the correct label \(y\)? The LTU perceptron classifier is \(1_{\left\{w x + b \geq 0\right\}}\).

📗 Answer: .

📗 [2 points] Consider a rectified linear unit (ReLU) with input \(x\) and a bias term. The output can be written as \(y\) = . Here, the weight is and the bias is . Write down the input value \(x\) that produces a specific output \(y\) = .

📗 The red curve is a plot of the activation function, given the y-value of the green point, the question is asking for its x-value.

📗 Answer: .

📗 [2 points] Consider a single sigmoid perceptron with bias weight \(w_{0}\) = , a single input \(x_{1}\) with weight \(w_{1}\) = , and the sigmoid activation function \(g\left(z\right) = \dfrac{1}{1 + \exp\left(-z\right)}\). For what input \(x_{1}\) does the perceptron output value \(a\) = .

📗 The red curve is a plot of the activation function, given the y-value of the green point, the question is asking for its x-value.

📗 Note: Math.js does not accept "ln(...)", please use "log(...)" instead.

📗 Answer: .

📗 [4 points] Suppose the squared loss is used to do stochastic gradient descent for logistic regression, i.e. \(C = \dfrac{1}{2} \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\) where \(a_{i} = \dfrac{1}{1 + e^{- w x_{i} - b}}\). Given the current weight \(w\) = and bias \(b\) = , with \(x_{i}\) = , \(y_{i}\) = , \(a_{i}\) = (no need to recompute this value), with learning rate \(\alpha\) = . What is the updated after the iteration? Enter a single number.

📗 Answer: .

📗 [3 points] We use gradient descent to find the minimum of the function \(f\left(x\right)\) = with step size \(\eta > 0\). If we start from the point \(x_{0}\) = , how small should \(\eta\) be so we make progress in the first iteration? Enter the largest number of \(\eta\) below which we make progress. For example, if we make progress when \(\eta < 0.01\), enter \(0.01\).

📗 Answer: .

📗 [3 points] Let \(x = \left(x_{1}, x_{2}, x_{3}\right)\). We want to minimize the objective function \(f\left(x\right)\) = using gradient descent. Let the stepsize \(\eta\) = . If we start at the vector \(x^{\left(0\right)}\) = , what is the next vector \(x^{\left(1\right)}\) produced by gradient descent?

📗 Answer (comma separated vector): .

📗 [2 points] Alice, Bob and Cindy go to the same school and live on a straight street lined with evenly spaced telephone poles. Alice's house is at the pole , Bob's is at the pole , Cindy's is at the pole . Where should the school set up a school bus stop so that the sum of distances (from house to bus stop) walked by the three students is minimized?

📗 Answer: .

📗 [1 points] A binary classifier is trained on a training set, and the resulting classifier is: \(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c \geq 0\) and \(\hat{y} = 0\) otherwise, and tested its performance on a separate test set. The accuracy of the classifier is . What is accuracy if the flipped classifier (\(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c < 0\) and \(\hat{y} = 0\) otherwise) is used?

📗 Enter a fraction to represent the accuracy, for example, enter 0.5 if the accuracy is 50 percent and enter 1 if the accuracy is 100 percent.

📗 Answer: .

📗 [2 points] A test set \(\left(x_{1}, y_{1}\right), ..., \left(x_{100}, y_{100}\right)\) contains labels \(y_{i}\) = for \(i = 1, ..., 100\). A classifier simply predicts all the time (the labels are +1 and -1). What is this classifier's test accuracy?

📗 Enter a fraction to represent the accuracy, for example, enter 0.5 if the accuracy is 50 percent and enter 1 if the accuracy is 100 percent.

📗 Answer: .

📗 [3 points] What is the minimum zero-one cost of a binary (y is either 0 or 1) linear (threshold) classifier (for example, LTU perceptron) on the following data set?

\(x_{i}\)	1	2	3	4	5	6
\(y_{i}\)

📗 A linear classifier is a vertical line that separates the two classes: you want to draw the line such that the least number of mistakes (i.e. zero-one cost) are made.

📗 Answer: .

📗 [3 points] Which ones of the following functions are equal to the squared error for deterministic binary classification? \(C = \displaystyle\sum_{i=1}^{n} \left(f\left(x_{i}\right) - y_{i}\right)^{2}, f\left(x_{i}\right) \in \left\{0, 1\right\}, y_{i} \in \left\{0, 1\right\}\). Note: \(I_{S}\) is the indicator notation on \(S\).

📗 Note: the question is asking for the functions that are identical in values.

📗 Choices:

\(\displaystyle\sum_{i=1}^{n}\)
\(\displaystyle\sum_{i=1}^{n}\)
\(\displaystyle\sum_{i=1}^{n}\)
\(\displaystyle\sum_{i=1}^{n}\)
\(\displaystyle\sum_{i=1}^{n}\)
None of the above

📗 [3 points] Let \(f\) be a continuously differentiable function in \(\mathbb{R}\). If the derivative \(f'\left(x\right)\) 0 at \(x\) = . Which values of \(x'\) are possible in the next step of gradient descent if we start at \(x\) = ? You can assume the learning rate is 1.

📗 Choices:

None of the above

📗 [3 points] Suppose there is a single integer input \(x\) = {\(0\), \(1\), ..., }, and the label is binary \(y\) = {\(0\), \(1\)}. Let \(\mathcal{H}\) be a hypothesis space containing all possible linear classifiers. How many unique classifiers are there in \(\mathcal{H}\)? For example, the three linear classifiers \(1_{\left\{x < 0.4\right\}}\), \(1_{\left\{x \leq 0.4\right\}}\) and \(1_{\left\{x < 0.6\right\}}\) are considered the same classifier since they classify all possible data sets the same way.

📗 Answer: .

📗 [2 points] Let the input \(x \in \mathbb{R}\). Thus the input layer has a single \(x\) input. The network has 5 hidden layers. Each hidden layer has 10 units. The output layer has a single unit and outputs \(y \in \mathbb{R}\). Between layers, the network is fully connected. All units in the network have a bias input. All units are linear units, namely the activation function is the identity function \(a = g\left(z\right) = z\), while \(z = w^\top x + b\) is a linear combination of all inputs to that unit (including the bias). Which functions can this network compute?

📗 Choices:

None of the above

📗 [2 points] In a three-layer (fully connected) neural network, the first hidden layer contains sigmoid units, the second hidden layer contains units, and the output layer contains units. The input is dimensional. How many weights plus biases does this neural network have? Enter one number.

📗 Answer: .

📗 [4 points] Fill in the missing weight below so that it computes the following function. All inputs takes value 0 or 1, and the perceptrons are linear threshold units. The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias .

\(x_{1}\)	\(x_{2}\)	\(y\) or \(a^{\left(2\right)}_{1}\)
0	0
0	1
1	0
1	1

📗 Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer: .

\(x_{1}\)	\(x_{2}\)	\(y\) or \(a^{\left(2\right)}_{1}\)
0	0
0	1
1	0
1	1

📗 Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer: .

📗 [2 points] We have a biased coin with probability of producing Heads. We create a predictor as follows: generate a random number uniformly distributed in (0, 1). If the random number is less than we predict Heads, otherwise, we predict Tails. What is this predictor's (expected) accuracy in predicting the coin's outcome?

📗 Answer: .

📗 [1 points] You want to design a neural network with sigmoid units to predict the academic role from his webpage. Possible roles are "professor" (label 0), "student" (label 1), "staff" (label 2). Suppose each person can take on only one of these roles at the same time. The neural network uses one-hot encoding, label 0 is encoded by \(\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\), label 1 is encoded by \(\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\), and label 2 is encoded by \(\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\). What is the role (enter a label, not a string) if the output is ?

📗 Answer: .

📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias

\(x_{i1}\)	\(x_{i2}\)	\(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0	0	?
0	1	?
1	0	?
1	1	?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer (comma separated vector): .

📗 [3 points] Suppose you are given a neural network with hidden layers, input units, output units, and hidden units. In one backpropogation step when computing the gradient of the cost (for example, squared loss) with respect to \(w^{\left(1\right)}_{11}\), the weight in layer \(1\) connecting input \(1\) and hidden unit \(1\), how many weights (including \(w^{\left(1\right)}_{11}\) itself, and including biases) are used in the backpropogation step of \(\dfrac{\partial C}{\partial w^{\left(1\right)}_{11}}\)?

📗 Note: the backpropogation step assumes the activations in all layers are already known so do not count the weights and biases in the forward step computing the activations.

📗 Answer: .

📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the cross entropy cost function \(C\) = . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one (stochastic) gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .

📗 Answer (comma separated vector): .

# X1 Past Exam Problems

# Warning: please enter your ID before you start!

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

# Question 6

# Question 7

# Question 8

# Question 9

# Question 10

# Question 11

# Question 12

# Question 13

# Question 14

# Question 15

# Question 16

# Question 17

# Question 18

# Question 19

# Question 20

# Question 21

# Question 22

# Question 23

# Question 24

# Question 25

# Question 26

# Question 27

# Question 28

# Question 29

# Question 30

# Question 31

# Question 32

# Question 33

# Question 34

# Question 35

# Question 36

# Question 37

# Question 38

# Question 39

# Question 40

# Question 41

# Question 42

# Question 43

# Question 44

# Question 45

# Question 46

# Question 47

# Question 48

# Question 49

# Question 50

# Grade