📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25m17
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 [4 points] Suppose the partial derivative of a cost function \(C\) with respect to some weight \(w\) is given by \(\dfrac{\partial C}{\partial w} = \displaystyle\sum_{i=1}^{n} \dfrac{\partial C_{i}}{\partial w} = \displaystyle\sum_{i=1}^{n} w x_{i}\). Given a data set \(x\) = {} and initial weight \(w\) = , compute and compare the updated weight after 1 step of batch gradient descent and steps of stochastic gradient descent (start with the same initial weight, then use data point 1 in step 1, data point 2 in step 2, ...). Enter two numbers, comma separated, batch gradient descent first. Use the learning rate \(\alpha\) = .
📗 Answer (comma separated vector): .
📗 [4 points] Given a neural network with 1 hidden layer with hidden units, suppose the current hidden layer weights are \(w^{\left(1\right)}\) = = , and the output layer weights are \(w^{\left(2\right)}\) = = . Given an instance (item) \(x\) = and \(y\) = , the activation values are \(a^{\left(1\right)}\) = = and \(a^{\left(2\right)}\) = . What is updated weight \(w^{\left(1\right)}_{21}\) after one step of stochastic gradient descent based on \(x\) with learning rate \(\alpha\) = ? The activation functions are all and the cost is square loss.
📗 Reminder: logistic activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = a_{i} \left(1 - a_{i}\right)\), tanh activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1 - a_{i}^{2}\), ReLU activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1_{\left\{a_{i} \geq 0\right\}}\), and square cost has gradient \(\dfrac{\partial C_{i}}{\partial a_{i}} = a_{i} - y_{i}\).
📗 Answer: .
📗 [3 points] Which functions are (weakly) convex on \(\mathbb{R}\)?
📗 Choices:
None of the above
📗 [3 points] Which functions are (weakly) convex on \(\mathbb{R}\)?
📗 You can plot an expression of x: using from to .
📗 Choices:
None of the above
📗 [3 points] Let \(x = \left(x_{1}, x_{2}, x_{3}\right)\). We want to minimize the objective function \(f\left(x\right)\) = using gradient descent. Let the stepsize \(\eta\) = . If we start at the vector \(x^{\left(0\right)}\) = , what is the next vector \(x^{\left(1\right)}\) produced by gradient descent?
📗 Answer (comma separated vector): .
📗 [3 points] Which one of the following is the gradient descent step for w if the activation function is and the cost function is ?
📗 Choices:
\(w = w - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)\)
\(w = w - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)\)
\(w = w - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)\)
\(w = w - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)\)
\(w = w - \alpha \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)\)
None of the above
📗 [4 points] Consider a linear model \(a_{i} = w^\top x_{i} + b\), with the cross entropy cost function \(C\) = . The initial weight is \(\begin{bmatrix} w \\ b \end{bmatrix}\) = . What is the updated weight and bias after one (stochastic) gradient descent step if the chosen training data is \(x\) = , \(y\) = ? The learning rate is .
📗 Answer (comma separated vector): .
📗 [3 points] Let \(f\) be a continuously differentiable function in \(\mathbb{R}\). If the derivative \(f'\left(x\right)\) 0 at \(x\) = . Which values of \(x'\) are possible in the next step of gradient descent if we start at \(x\) = ? You can assume the learning rate is 1.
📗 Choices:
None of the above
📗 [4 points] Suppose the squared loss is used to do stochastic gradient descent for logistic regression, i.e. \(C = \dfrac{1}{2} \displaystyle\sum_{i=1}^{n} \left(a_{i} - y_{i}\right)^{2}\) where \(a_{i} = \dfrac{1}{1 + e^{- w x_{i} - b}}\). Given the current weight \(w\) = and bias \(b\) = , with \(x_{i}\) = , \(y_{i}\) = , \(a_{i}\) = (no need to recompute this value), with learning rate \(\alpha\) = . What is the updated after the iteration? Enter a single number.
📗 Answer: .
📗 [3 points] We use gradient descent to find the minimum of the function \(f\left(x\right)\) = with step size \(\eta > 0\). If we start from the point \(x_{0}\) = , how small should \(\eta\) be so we make progress in the first iteration? Enter the largest number of \(\eta\) below which we make progress. For example, if we make progress when \(\eta < 0.01\), enter \(0.01\).
📗 Answer: .
📗 [3 points] We use gradient descent to find the minimum of the function \(f\left(x\right)\) = with step size \(\eta > 0\). If we start from the point \(x_{0}\) = , how small should \(\eta\) be so we make progress in the first iteration? Check all values of \(\eta\) that make progress.
\(\eta\) = 00
📗 The green point is the current \(x_{0}\). You can change the \(\eta\) values using the slider and see the x value in the next iteration as the red point and check whether it gets closer to the minimum.
📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 17" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 You can find videos going through the questions on Link.