📗 The quizzes must be completed during the lectures and submitted on TopHat: Link. No Canvas submissions are required. The grades will be updated by the end of the week on Canvas.
📗 Please submit a regrade request if (i) you missed a few questions because you are late or have to leave during the lecture; (ii) you selected obviously incorrect answers by mistake (one or two of these shouldn't affect your grade): Link
The following questions may appear as quiz questions during the lecture. If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 [4 points] Consider the following Markov Decision Process. It has two states \(s\), A and B. It has two actions \(a\): move and stay. The state transition is deterministic: "move" moves to the other state, while "stay" stays at the current state. The reward \(r\) is for move (from A and B), for stay (in A and B). Suppose the discount rate is \(\beta\) = .
Find the Q table \(Q_{i}\) after \(i\) = updates of every entry using Q value iteration (\(i = 0\) initializes all values to \(0\)) in the format described by the following table. Enter a two by two matrix.
State \ Action
stay
move
A
?
?
B
?
?
📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 [3 points] Consider state space \(S = \left\{s_{1}, s_{2}\right\}\) and action space \(A\) = {left, right}. In \(s_{1}\) the action "right" sends the agent to \(s_{2}\) and collects reward \(r = 1\). In \(s_{2}\) the action "left" sends the agent to \(s_{1}\) but with zero reward. All other state-action pairs stay in that state with zero reward. With discounting factor \(\gamma\) = , what is the value \(v\left(s_{2}\right)\) under the optimal policy.