📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50x8
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 [2 points] Suppose scaled dot-product attention function is used. Given two vectors \(q\) = , \(k\) = , calculate the attention score of \(q\) to \(k\).
📗 Answer: .
📗 [3 points] Assume tokenization rule is using whitespaces between words as separator, input one sentence \(s_{1}\) into decoder stack during training time. Write down the attention mask of self-attention block in decoder, where \(1\) = attented, \(0\) = masked.
Sentence: \(s_{1}\) = "". (Note: "< s >" is one token, not three).
📗 Answer (matrix with multiple lines, each line is a comma separated vector):
📗 [3 points] Given the variance matrix of a data set \(V\) = , a principal component \(u\) = , what is the projected variance of the data set in the direction \(u\)?
📗 Answer: .
📗 [3 points] Suppose the UCB1 (Upper Confidence Bound) Algorithm is used to select arms in a multi-armed bandit problem, and in round \(t\) = , the arms pulls and empirical means \(\hat{\mu}\) for the arms are summarized in the following table, and in period \(t + 1\), an arm is pulled according to the UCB1 Algorithm and the reward is . Compute the updated empirical means of the arms after period \(t + 1\), i.e. updated \(\hat{\mu}_{1}, \hat{\mu}_{2}, ...\). Use \(c\) = .
Arms
arm pulls (\(n_{k}\))
empirical means \(\hat{\mu}_{k}\)
upper confidence bounds \(\hat{\mu}_{k} + c \sqrt{2 \dfrac{\log t}{n_{k}}}\)
\(k = 1\)
\(k = 2\)
\(k = 3\)
📗 Answer (comma separated vector): .
📗 [3 points] In an infinite horizon MDP (Markov Decision Process), there are \(n\) = states: initial state \(s_{0}\), and absorbing states \(s_{1}, s_{2}, ..., s_{n-1}\). In state \(s_{0}\), the agent can stay or move to any other state, but in all other absorbing states the agent can only choose to stay. The reward from staying in those states are summarized in the following table. Compute the Q value (under the optimal policy, not from Q learning) \(Q\left(s_{0}, \text{stay}\right)\). Use the discount factor \(\gamma\) = .
State
\(s_{0}\)
\(s_{1}\)
\(s_{2}\)
\(s_{3}\)
\(s_{4}\)
Reward from stay
Reward from move
-
-
-
-
📗 Answer: .
📗 [4 points] You will receive 4 points for this question and you can choose to donate x points (a number between 0 and 4). Your final grade for this question is the points you keep plus twice the average donation (sum of the donations from everyone in your section divided by the number of people in your section, combining both versions). Enter the points you want to donate (an integer between 0 and 4).
📗 Answer: (The grade for this question will be updated later).
📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 8" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.