📗 Regular component (out of 5) should be submitted using the "Grade" and "Submit" buttons at the bottom of the page.
➩ Submission of the text file generated by the auto-grader to Canvas Assignment A10 is optional.
➩ Due date: August 9, no submission after that will be accepted.
📗 Competition component (out of 5) text file generated using Question 9 "Generate" button should be submitted to the Canvas Assignment A10C: Link
➩ Submission of an incorrectly formatted text file and any additional files to A10C will result in a competition score of \(-\infty\).
➩ Due date: August 4, no submission after that will be accepted under any circumstances.
📗 Note: Canvas A10 and A10C due date is the recommended due date, early submissions of competitions before the recommended due date will participate in trial competitions with the option to keep the score (not ranking).
📗 Hint: example submissions, discussion session schedules, and group recommendations (very different for different assignments) can be found on Piazza: Link.
📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10a105
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could either copy and paste or load your program outputs into the text boxes for individual questions or print all your outputs to a single text file and load it using the button at the bottom of the page.
📗 Please do not refresh the page: your answers will not be saved.
📗 You can write the code in any programming language and using any large language models. You do not have to submit your code.
📗 (Introduction) In this project, you will use multi-agent reinforcement learning algorithms to solve a Markov game to control a soccer player. You will submit a stochastic Markov policy function represented by a neural network (probabilities to move up, down, left, right given the positions of the players and the ball).
📗 (Part 1) Make sure you can reproduce the environment, and compute a deterministic single-agent optimal policy against a player that is always choosing the action L.
📗 (Part 2) Produce a stochastic policy profile that approximates the optimal policy against a player that (i) always moves towards your player if you have the ball, and (ii) always moves towards the goal if they have the ball.
📗 (Competition) Submit a stochastic policy to play against policies created by other students in the course. Be aware that other students may strategically submit a non-equilibrium policy.
Rules of the game:
➩ The ball will be given to one of the players (player 1 at \(\left(1, 1\right)\) and player 2 at \(\left(3, 2\right)\)) at the beginning.
➩ If the two players try to occupy the same square, only one of them (randomly decided) will move, and the other player will get the ball.
➩ If the two players try to swap squares, they will swap, and the other player will get the ball.
➩ If one player tries to crash into the other player, the other player will get the ball with probability \(\dfrac{1}{2}\).
➩ The game restarts after one player scores.
You will play 10 times with each of the other players in your team and your score will be the number of wins plus \(0.5\) times the number of ties (ties only happen if no one scores within 100 steps).
Your project grade is based on your submission to this assignment (out of 5) plus your ranking in the class (out of 5):
Top 20% gets 5/5.
Next 20% gets 4/5.
Next 20% gets 3/5.
Next 20% gets 2/5.
Next 20% gets 1/5.
(The students who do not participate in the competition will be given scores of negative infinities when computing the rankings).
Competition [To be updated before the trial competition, do not use for testing purposes]
VS
Step: : 0 /
Leader board:
Submissions: , Current:
You can use the demo to play against a simple opponent in Part 2 (i.e. not optimal and not equilibrium). Click on a neighboring square to move your player (player 1).
📗 [5 points] For the state \(\left(x_{1}, y_{1}, x_{2}, y_{2}, b\right)\) = , find the successor states after the actions UU, UD, UL, UR, DU, DD, DL, DR, LU, LD, LL, LR, RU, RD, RL, RR. In case there is a random change in the possession of the ball, assume \(b = 1\) (ball goes to player 1). (16 lines, with 5 integers on each line, comma separated).
📗 [5 points] For each of the previous successor states, compute the reward from the transition to that state. (16 numbers, comma separated, on one line).
📗 [5 points] For the state \(\left(x_{1}, y_{1}, x_{2}, y_{2}, b\right)\) = , find the successor states after the actions UU, UD, UL, UR, DU, DD, DL, DR, LU, LD, LL, LR, RU, RD, RL, RR. In case there is a random change in the possession of the ball, assume \(b = 1\) (ball goes to player 1). (16 lines, with 5 integers on each line, comma separated).
📗 [5 points] For each of the previous successor states, compute the reward from the transition to that state. (16 numbers, comma separated, on one line).
📗 [5 points] Enter a set of weights of your network (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(5\) by \(h_{1}\), second matrix should be \(h_{1}\) by \(h_{2}\), and the last matrix should be \(h_{2}\) by \(4\)).
📗 [10 points] Enter a sequence of states with length 100 based on your network from the previous question controlling player 1 and a player 2 that always chooses action L. (100 lines, 5 integers on each line).
📗 [5 points] Enter a set of weights of your network (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(5\) by \(h_{1}\), second matrix should be \(h_{1}\) by \(h_{2}\), and the last matrix should be \(h_{2}\) by \(4\)).
📗 [10 points] Enter a sequence of states with length 100 based on your network from the previous question controlling player 1 and a player 2 that uses the policy in Part 2 of the instruction. (100 lines, 5 integers on each line).
📗 [1 points] Please list the AI tools and references you used and the names of other students and course staff you discussed the assignment or competition with. Please also enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading. If you completed the assignment without any help (not recommended), please enter "None" and do not leave this question blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##a: 10" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 Saving and loading may take around 5 to 10 seconds. Please be patient and do not click "Load" multiple times.
📗 Presentations and interviews are optional for the competitions.
📗 If your competition grade is 2, 3, or 4, you can book an interview with the TA for 15 to 30 minutes.
📗 Interviews can only be booked during discussion sessions on Zoom (either during the current discussion session or for a future date and time): Link. Please do not email/spam the TA.
📗 A maximum of 3 interviews can be booked per person, and in the case you need 1 point for the next letter grade, we will allow a 4th one after the final exam.
📗 During the interviews, you will give a 5 to 10 minutes presentation to explain anything you did on the project that is creative or technically challenging. Then you will answer three technical questions about your presentation or any materials related to the assignment.
➩ If you answer any one of the three questions incorrectly, you will get \(-1\).
➩ If you answer all questions correctly, and if your presentation ideas are correct, interesting, consistent with your submissions, and not done by many other students (we will make the decision after all interviews are done), you will get \(+1\).