📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10a125
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The due date (hard deadline) is August 10, late submissions to the competitive project components will not be accepted under any circumstances. The remaining assignment can be submitted to earn a maximum of 5 points before August 10 without penalty.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could either copy and paste or load your program outputs into the text boxes for individual questions or print all your outputs to a single text file and load it using the button at the bottom of the page.
📗 Please do not refresh the page: your answers will not be saved.
📗 You should implement the algorithms using the mathematical formulas from the slides. You can use packages and libraries to preprocess and read the data and format the outputs. It is not recommended that you use machine learning packages or libraries, but you will not lose points for doing so.
TODO for next year: scoring rule is incorrect; the trained algorithm should replicate the heuristic policy; the rules should be changed so that there is a unique mixed strategy Nash equilibrium.
📗 (Introduction) In this project, you will use multi-agent reinforcement learning algorithms to solve a Markov game to control a soccer player. You will submit a stochastic Markov policy function represented by a neural network (probabilities to move up, down, left, right given the positions of the players and the ball).
📗 (Part 1) Make sure you can reproduce the environment, and compute a deterministic single-agent optimal policy against a player that is always choosing the action L.
📗 (Part 2) Produce a stochastic policy profile that approximates the optimal policy against a player that (i) always moves towards your player if you have the ball, and (ii) always moves towards the goal if they have the ball.
📗 (Competition) Submit a stochastic policy to play against policies created by other students in the course. Be aware that other students may strategically submit a non-equilibrium policy.
Rules of the game:
➩ The ball will be given to one of the players (player 1 at \(\left(1, 1\right)\) and player 2 at \(\left(3, 2\right)\)) at the beginning.
➩ If the two players try to occupy the same square, only one of them (randomly decided) will move, and the other player will get the ball.
➩ If the two players try to swap squares, they will swap, and the other player will get the ball.
➩ If one player tries to crash into the other player, the other player will get the ball with probability \(\dfrac{1}{2}\).
➩ The game restarts after one player scores.
Your submission should contain (i) your player name (not necessarily your real name), (ii) your player icon (single emoji from this list: Link), (iii) your team (a random number between 0 and 1, rounded to four decimal places), (iv) your network weights, (v) [optional] your second network weights (used if your opponent provides a second network too), and have the following format in a .txt text file:
➩ Small example:
➩ Large example:
You will play 10 times with each of the other players in your team and your score will be the number of wins plus \(0.5\) times the number of ties (ties only happen if no one scores within 100 steps). Your project grade is based on your submission to this assignment (out of 5) plus your ranking within your team (out of 5):
Top 10% gets 5/5 in each team.
Next 10% gets 4/5 in each team.
Next 10% gets 3/5 in each team.
Next 10% gets 2/5 in each team.
Next 10% gets 1/5 in each team.
(The students who do not participate in the competition will be evenly split into each of the teams with scores of 0s when computing the rankings).
Competition
VS
Step: : 0 /
Leader board:
Submissions: , Current:
You can use the demo to play against a simple opponent in Part 2 (i.e. not optimal and not equilibrium). Click on a neighboring square to move your player (player 1).
📗 [5 points] For the state \(\left(x_{1}, y_{1}, x_{2}, y_{2}, b\right)\) = , find the successor states after the actions UU, UD, UL, UR, DU, DD, DL, DR, LU, LD, LL, LR, RU, RD, RL, RR. In case there is a random change in the possession of the ball, assume \(b = 1\) (ball goes to player 1). (16 lines, with 5 integers on each line, comma separated).
📗 [5 points] For each of the previous successor states, compute the reward from the transition to that state. (16 numbers, comma separated, on one line).
📗 [5 points] For the state \(\left(x_{1}, y_{1}, x_{2}, y_{2}, b\right)\) = , find the successor states after the actions UU, UD, UL, UR, DU, DD, DL, DR, LU, LD, LL, LR, RU, RD, RL, RR. In case there is a random change in the possession of the ball, assume \(b = 1\) (ball goes to player 1). (16 lines, with 5 integers on each line, comma separated).
📗 [5 points] For each of the previous successor states, compute the reward from the transition to that state. (16 numbers, comma separated, on one line).
📗 [5 points] Enter a set of weights of your network (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(5\) by \(h_{1}\), second matrix should be \(h_{1}\) by \(h_{2}\), and the last matrix should be \(h_{2}\) by \(4\)).
📗 [10 points] Enter a sequence of states with length 100 based on your network from the previous question controlling player 1 and a player 2 that always chooses action L. (100 lines, 5 integers on each line).
📗 [5 points] Enter a set of weights of your network (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(5\) by \(h_{1}\), second matrix should be \(h_{1}\) by \(h_{2}\), and the last matrix should be \(h_{2}\) by \(4\)).
📗 [10 points] Enter a sequence of states with length 100 based on your network from the previous question controlling player 1 and a player 2 that uses the policy in Part 2 of the instruction. (100 lines, 5 integers on each line).
📗 [1 points] If you are not planning to participate in the competition, enter "0" or "none" for this question to get the point. If you are planning to participate in the competition, attach the text file you are planning to submit to Canvas to make sure if your submission has the correct format.
📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the question that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 Warning: grading may take around 10 to 20 seconds. Please be patient and do not click "Grade" multiple times.
📗 You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself . You can also include the resulting file with your code on Canvas Assignment CP4.
📗 The competition file should be submitted to the Canvas Assignment CP4 Competition in a text file named "CP4.txt" (please do not use a different file name).
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##a: 12" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 Saving and loading may take around 10 to 20 seconds. Please be patient and do not click "Load" multiple times.
📗 No sample solutions will be posted for these assignments.
📗 You are allowed to use code from other people (with their permission) and from the Internet, but you must and give attribution at the beginning of the your code. You are allowed to use large language models such as GPT4 to write parts of the code for you, but you have to include the prompts you used in the code submission. For example, you can put the following comments at the beginning of your code:
% Code attribution: (TA's name)'s A12 example solution.
% Code attribution: (student name)'s A12 solution.
% Code attribution: (student name)'s answer on Piazza: (link to Piazza post).
% Code attribution: (person or account name)'s answer on Stack Overflow: (link to page).
% Code attribution: (large language model name e.g. GPT4): (include the prompts you used).
📗 You can get help on understanding the algorithm from any of the office hours; to get help with debugging, please go to the TA's office hours. For times and locations see the Home page. You are encouraged to work with other students, but if you use their code, you must give attribution at the beginning of your code.