📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50x4
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 [4 points] Given a neural network with 1 hidden layer with hidden units, suppose the current hidden layer weights are \(w^{\left(1\right)}\) = = , and the output layer weights are \(w^{\left(2\right)}\) = = . Given an instance (item) \(x\) = and \(y\) = , the activation values are \(a^{\left(1\right)}\) = = and \(a^{\left(2\right)}\) = . What is updated weight \(w^{\left(1\right)}_{21}\) after one step of stochastic gradient descent based on \(x\) with learning rate \(\alpha\) = ? The activation functions are all and the cost is square loss.
📗 Reminder: logistic activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = a_{i} \left(1 - a_{i}\right)\), tanh activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1 - a_{i}^{2}\), ReLU activation has gradient \(\dfrac{\partial a_{i}}{\partial z_{i}} = 1_{\left\{a_{i} \geq 0\right\}}\), and square cost has gradient \(\dfrac{\partial C_{i}}{\partial a_{i}} = a_{i} - y_{i}\).
📗 Answer: .
📗 [4 points] Suppose the partial derivative of a cost function \(C\) with respect to some weight \(w\) is given by \(\dfrac{\partial C}{\partial w} = \displaystyle\sum_{i=1}^{n} \dfrac{\partial C_{i}}{\partial w} = \displaystyle\sum_{i=1}^{n} w x_{i}\). Given a data set \(x\) = {} and initial weight \(w\) = , compute and compare the updated weight after 1 step of batch gradient descent and steps of stochastic gradient descent (start with the same initial weight, then use data point 1 in step 1, data point 2 in step 2, ...). Enter two numbers, comma separated, batch gradient descent first. Use the learning rate \(\alpha\) = .
📗 Answer (comma separated vector): .
📗 [4 points] Given the two training points and and their labels \(0\) and \(1\). What is the kernel (Gram) matrix if the RBF (radial basis function) Gaussian kernel with \(\sigma\) = is used? Use the formula \(K_{i i'} = e^{- \dfrac{1}{2 \sigma^{2}} \left(x_{i} - x_{i'}\right)^\top \left(x_{i} - x_{i'}\right)}\).
📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 4" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.