Young Wu's Homepage

Previous: M2, Next: M4

Back to week 2 page: Link

Official Due Date: May 31

# Written (Math) Problems

📗 Enter your ID here: and click

📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.

📗 Some of the referenced past exams can be found in on Professor Zhu's and Professor Dyer's websites: Link and Link.

📗 Please do not refresh the page: your answers will not be saved. You can save and load your answers (only fill-in-the-blank questions) using the buttons at the bottom of the page.

📗 Please report any bugs on Piazza.

# Warning: please enter your ID before you start!

# Question 1 [2]

📗 (Fall 2017 Final Q19) Let the input \(x \in \mathbb{R}\). Thus the input layer has a single \(x\) input. The network has 5 hidden layers. Each hidden layer has 10 units. The output layer has a single unit and outputs \(y \in \mathbb{R}\). Between layers, the network is fully connected. All units in the network have a bias 1 input. All units are linear units, namely the activation function is the identity function \(v = f\left(u\right) = u\), while \(u\) is a linear combination of all inputs to that unit (including bias 1). Which functions can this network compute?

📗 Choices:

None of the above

# Question 2 [1]

📗 (Fall 2017 Final Q21) A binary classifier is trained on a training set, \(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c \geq 0\) and \(\hat{y} = 0\) otherwise, and tested its performance on a separate test set. The accuracy of the classifier is . What is accuracy if the flipped classifier (\(\hat{y} = 1\) if \(a x_{1} + b x_{2} + c < 0\) and \(\hat{y} = 0\) otherwise) is used?

📗 Hint: accuracy means: number of correct predictions divided by the total number of test instances.

📗 Answer: .

# Question 3 [2]

📗 (Fall 2014 Midterm Q7) A test set \(\left(x_{1}, y_{1}\right), ..., \left(x_{100}, y_{100}\right)\) contains labels \(y_{i}\) = for \(i = 1, ..., 100\). A classifier simply predicts all the time (the labels are +1 and -1). What is this classifier's test accuracy?

📗 Answer: .

# Question 4 [3]

📗 (Fall 2014 Final Q16) The sigmoid function in a neural network is defined as \(g\left(x\right) = \dfrac{1}{1 + e^{-x}}\). There is an another activation function defined as \(h\left(x\right)\) = . If \(h\left(x\right) = a \cdot g\left(b \cdot x\right) + c\), write down the values of \(a, b, c\).

📗 Answer (comma separated vector): .

# Question 5 [2]

📗 (Fall 2013 Final Q8, Fall 2019 Final Q14) In a three-layer neural network, the first layer contains sigmoid units, the second layer contains units, and the output layer contains units. The input is dimensional. How many weights and biases does this neural network have?

📗 Hint: there are 2 hidden layers, and the output layer having k units means it is used for k-class classification (for example, using softmax activations).

📗 Answer: .

# Question 6 [4]

📗 (Fall 2010 Final Q17) Fill in the missing weight below so that it computes the following function. All inputs takes value 0 or 1, and the perceptrons are linear threshold units.

\(x_{1}\)	\(x_{2}\)	\(y\) or \(o_{1}\)
0	0
0	1
1	0
1	1

📗 Hint: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer: .

# Question 7 [4]

📗 (Fall 2010 Final Q17) Fill in the missing weight below so that it computes the following function. All inputs takes value 0 or 1, and the perceptrons are linear threshold units.

\(x_{1}\)	\(x_{2}\)	\(y\) or \(o_{1}\)
0	0
0	1
1	0
1	1

📗 Hint: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer: .

# Question 8 [1]

📗 (Fall 2011 Midterm Q12) You want to design a neural network with sigmoid units to predict the academic role from his webpage. Possible roles are "professor" (label 0), "student" (label 1), "staff" (label 2). Suppose each person can take on only one of these roles at the same time. The neural network uses one-hot encoding, label 0 is encoded by \(\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\), label 1 is encoded by \(\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\), and label 2 is encoded by \(\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\). What is the role (enter a label, not a string) if the output is ?

📗 Answer: .

# Question 9 [2]

📗 Compare ReLU, Tanh, Sigmoid, and Linear activation functions on the page: Link. Compare these with different learning rate and without regularization on different datasets. Also try different network structure. Discuss the differences on Piazza: Link.

📗 My favorite activation function is: and I have participated in the discussion on Piazza.

# Question 10 [1 points]

📗 Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the questions that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.

📗 Answer: .

# Grade

* * * * *

* * * * *

📗 Please copy and paste the text between the *s (not including the *s) and submit it on Canvas, M3.

📗 You could save the text as text file using the button or just copy and paste it into a text file.

📗 Warning: the load button does not function properly for all questions, please recheck everything after you load. You could load your answers using the button from the text field:

Last Updated: July 14, 2024 at 9:37 PM