📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15m4
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 [4 points] Deadpool and Wolverine decide how to divide up the \(y\) = (million) dollars they got. Deadpool proposes a division \(\left(x_{0}, y - x_{0}\right)\), where \(x_{0}\) is an integer between \(0\) and \(y\) representing the amount of money (in millions) for Deadpool himself, and Wolverine decides whether to accept or reject the proposal. If the proposal is accepted, the game ends, and if the proposal is rejected, \(z\) = (millions) will be stolen while they argue, and after that, Wolverine will propose another division \(\left(x_{1}, y - z - x_{1}\right)\), where \(x_{1}\) is an integer between \(0\) and \(y - z\) representing the amount of money (in millions) for Deadpool, and Deadpool decides whether to accept or reject the proposal. The process of alternating proposal continues until either a proposal \(\left(x_{t}, y - t z - x_{t}\right)\) is accepted or all \(y\) millions are stolen. Suppose Deadpool and Wolverine both want to maximize the money they get, and in case of indifference, both players will accept the offer, what is \(x_{0}\) in the solution of this game?
📗 Answer:
📗 [3 points] Consider the following zero-sum game, in a Nash equilibrium, the row player uses actions \(U, M, D\) with probabilities , and the column player uses actions \(L, C, R\) with probabilities \(q_{1}, q_{2}, q_{3}\). Write down \(q\) as a vector (probabilities that sum up to 1).
Actions
L
C
R
U
M
D
📗 Answer (comma separated vector): .
📗 [3 points] There are actions, and a reinforcement learning agent uses UCB1 (Upper Confidence Bound) algorithm to select the next action in the next round. The current upper confidence bounds are , lower confidence bounds are . There is an adversary who chooses rewards for each action and its goal is to minimize the reward for the learner. If the rewards for all actions must be between and (inclusive) and must sum up to . What rewards would the adversary for the next round? Enter one number for each action, comma separated.
📗 Answer (comma separated vector):
📗 [3 points] Given the following table summarizing the number of arm pulls for each of the arms and their expected rewards, what is the total (expected) regret of this experiment? Assume no discounting is applied.
Arm
Number of Arm Pulls
Expected Reward
1
2
3
4
📗 Answer:
📗 [3 points] Given the following Q table, what is an optimal policy? Enter only one action per state (break ties by using the action with the lowest numerical value).
State \ Action
1
2
3
4
1
2
3
4
5
📗 Answer (comma separated vector):
📗 [4 points] For a reinforcement learning problem, there are states and actions. Each state can be represented by features. Deep Q Networks are used to represent the Q function: there is only one hidden layer with \(n\) hidden units, the input units represent the features of the states, and output units represent the Q value for each action. What is the smallest value of \(n\) so that the number of weights plus biases of the network is strictly larger than the number of Q values in the original Q table?
📗 Answer:
📗 [3 points] Three players play a variant of the split or steal game. There is a total of (million) dollars: if all three players choose steal, they will all get \(0\); if two players choose steal, they will each get \(0\) and the other player choosing split will get all the money; if one player chooses steal, that player will get all the money and the other two players will get \(0\); if all three players choose split, they will evenly split the money. The players only care about maximizing the money they get. How many different pure strategy Nash equilibria are there? Note: (split, split, steal) and (steal, split, split) are considered two different outcomes.
📗 Answer:
📗 [3 points] Given the scores in the following table, if hill-climbing (valley-finding) is used, how many states will lead to the global imum? Note: the neighbors of state \(i\) are states \(i - 1\) and \(i + 1\) (if they exist).
State
0
1
2
3
4
5
6
7
Score
📗 Answer: .
📗 [2 points] In simulated annealing we move from \(s\) to an inferior neighbor \(t\) with probability \(\exp\left(\dfrac{- \left| f\left(s\right) - f\left(t\right) \right|}{T}\right)\), where \(T\) is the temperature parameter. Suppose \(f\left(s\right)\) = and \(f\left(t\right)\) = and \(T\) = . What is the probability we stay at \(s\) instead of moving to \(t\)?
📗 Note: we are minimizing the score.
📗 Answer: .
📗 [4 points] When using the Genetic Algorithm, suppose the states are \(\begin{bmatrix} x_{1} & x_{2} & ... & x_{T} \end{bmatrix}\) = , , , . Let \(T\) = , the fitness function (not the cost) is \(\mathop{\mathrm{argmin}}_{t \in \left\{1, ..., T + 1\right\}} x_{t} = 1\) with \(x_{T + 1} = 1\) (i.e. the index of the first feature that is 1). What is the reproduction probability of the first state: ?
📗 Answer: .
📗 [4 points] Consider a zero-sum sequential move game with Chance. Min player moves first, then Chance, then Max. The values of the terminal states are shown in the diagram (they are the values for the Max player). What is the (expected) value of the game (for the Max player)?
📗 Note: in case the diagram is not clear, the probabilities from left to right is: , and the rewards are .
📗 Answer: .
📗 [4 points] Enter the largest integer value of \(A\) such that \(B\) will be alpha-beta pruned? Min player moves first. In the case alpha = beta, prune the node. Enter 100 if you think the answer is infinity.
📗 Answer: .
📗 [4 points] The Nash equilibrium of the following simultaneous move zero-sum game is (U, L): the entry marked by \(x\). What is the smallest and largest possible integer values of \(x\)? Enter two numbers. (U, L) can be one of possibly many Nash equilibria.
📗 Note: if there is only one possible value, enter the same value twice; and if no values are possible, enter \(0, 0\).
MAX \ MIN
L
C
R
U
\(x\)
M
R
📗 Answer (comma separated vector): .
📗 [4 points] You will receive 4 points for this question and you can choose to donate x points (a number between 0 and 4). Your final grade for this question is the points you keep plus twice the average donation (sum of the donations from everyone in your section divided by the number of people in your section, combining both versions). Enter the points you want to donate (an integer between 0 and 4).
📗 Answer: (The grade for this question will be updated later).
📗 [1 points] Please enter any comments including possible mistakes and bugs with the questions or your answers. If you have no comments, please enter "None": do not leave it blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 Please wait for the message "Successful submission." to appear after the "Submit" button. Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself and submit it to Canvas Assignment X1. You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##x: 4" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.