📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10a105
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The due date (hard deadline) is July 27 and August 3, late submissions to the competitive project components will not be accepted under any circumstances. The remaining assignment can be submitted to earn a maximum of 5 points before August 10 without penalty.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could either copy and paste or load your program outputs into the text boxes for individual questions or print all your outputs to a single text file and load it using the button at the bottom of the page.
📗 Please do not refresh the page: your answers will not be saved.
📗 You should implement the algorithms using the mathematical formulas from the slides. You can use packages and libraries to preprocess and read the data and format the outputs. It is not recommended that you use machine learning packages or libraries, but you will not lose points for doing so.
TODO for next year: do not ask for the stochastic policy since it's not used in the competition, ask for deterministic policy and specify the tie-breaking rule; need to make crashing rest time longer so that always accelerating is not near-optimal in competition 1; next year, make this project two separate projects, one in 3D one in 2D; the sensors are from right to left, but should be changed to from left to right next year.
1:train;2:net_1;3:act_1;4:test;5:label;6:net_2;7:act_2;8:pred_2;9:project
📗 (Introduction) In this project, you will use either supervised learning or reinforcement learning techniques to train an autonomous vehicle to move on a continuous state space, similar to this project: Link, Link or Link. You will submit a policy network (the neural network that produces the actions based on the state of the environment) that outputs a deterministic Markov policy ("turn left", "turn right", "speed up" or "no action" given the state of the vehicle, including position, velocity and distances to walls or other vehicles observed by the sensors). The neural network should be fully connected with two hidden layers (ReLU activation, a maximum of 100 units in each layer) and input layer with \(4 + k\) units, and output layer with 4 units (softmax activation), where \(k\) is the number of sensors in front of the car, an odd integer between \(1\) and \(9\) (that is \(k\) must be \(1, 3, 5, 7, 9\)).
📗 (Part 1) Given a neural network with random weights, make sure you can produce the correct Markov policy to control the vehicle.
📗 (Part 2) Train your network to replicate a simple behavioral policy for a simple environment without other vehicles. The vehicle has 5 sensors: (i) if one of the two sensors on the left has the highest value, turn left; (ii) if one of the two sensors on the right has the highest value, turn right; (iii) if the sensor in the middle has the highest value, speed up. Tie break rule for the highest value: speed up > left > right.
📗 (Competition) Submit a policy network to compete in a random environment with other students. Game theoretic considerations can be made to modify your policy (through retraining with different training sets). The competition will be held twice, once around the midterm (no other players) and once around the final exam (with other players). You can optionally submit two policy networks and the maximum score from the two will be used in determining your ranking (the second vehicle can be designed to crash into other players and slow them down too).
Your submission should contain (i) your player name (not necessarily your real name), (ii) your player icon (single emoji from this list: Link), (iii) your team (a random number between 0 and 1, rounded to four decimal places), (iv) your network weights, (v) [optional] your second network weights, and have the following format in a .txt text file:
➩ Small example:
➩ Large example:
Your score will be the total distance traveled within a fixed number of frames, that is,
➩ Your score will be higher if your speed up more often.
➩ Your score will be higher if you crash fewer number of times.
Your project grade is based on your submission to this assignment (out of 5) plus your ranking within your team (out of 5):
Top 10% gets 5/5 in each team.
Next 10% gets 4/5 in each team.
Next 10% gets 3/5 in each team.
Next 10% gets 2/5 in each team.
Next 10% gets 1/5 in each team.
(The students who do not participate in the competition will be evenly split into each of the teams with scores of 0s when computing the rankings).
Competition
Leader board:
Submissions:
Time limit:
Track:
Game:
📗 Details on computing the total distances:
➩ The cars are only allowed to move inside [0, 1] x [0, 1] region.
➩ There is no penalty for crashing with the walls or other vehicles except for the speed will decrease to 12.5% of the original speed.
➩ The action "speed up" will increase the speed by 2.5% per frame.
➩ The actions "turn left" and "turn right" will decrease the speed by 1.25% per frame, and change the angle (in radians) by 0.01.
➩ The speed is clamped between 0.001 and 0.1.
➩ The total distance is computed by summing up the lengths of the line segments connecting the positions between every two frames (it should be approximately proportional to the sum of lengths of the velocities too).
You can use following demo to test policies or generate training sets (each line is ["action", "x", "y", "vx", "vy", "s1", "s2", ... "sk"], where "si" is distance measured by sensor i, from right (i = 1) to left (i = k)). The dark red lines represent how the sensors compute the distances for the vehicle you control. To collect data points, enter your network, the number of data points and number of frames to skip, then check the "Collect" checkbox to start.
Neural network to control the car: , Player icon:
Data set: Collect (max 1000) every frames.
Number of other cars (or networks to control them, separated by "====="):
Race track:
Game track: Create tracks
➩ You can create track that is a -sided regular polygon with width using this button: and click the "Restart" button.
➩ You can also try one of the F1 tracks: Link. You can create the track based on the following track centers with width using this button: and click the "Restart" button:
You can also manually control one vehicle (arrow keys or "wasd" to use actions "speed up" , "turn left", "no action", "turn right"). Not pressing any of the keys will perform "no action" too. Click anywhere inside the square (or check the "Collect" box) before you start.
Number of sensors: , Player icon:
Data set: Collect (max 1000) every frames.
Number of other cars (or networks to control them, separated by "====="):
Race track:
Game track: Create tracks
➩ You can create track that is a -sided regular polygon with width using this button: and click the "Restart" button.
➩ You can also try one of the F1 tracks: Link. You can create the track based on the following track centers with width using this button: and click the "Restart" button:
📗 [1 points] Enter a random set of weights (biases in the last row) of your network in the correct format (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(4 + k + 1\) by \(h_{1}\), second matrix should be \(h_{1} + 1\) by \(h_{2}\), and the last matrix should be \(h_{2} + 1\) by \(4\), all numbers rounded by 4 decimal places).
Hint
📗 TBA.
You can use the demo below to test if your network predictions are correct.
📗 [10 points] Use the network from the previous question on the feature matrix you provided, and output the stochastic policy (300 lines, 4 numbers in each line, [probability of 0 (turn left), probability of 1 (turn right), probability of 2 (speed up), probability of 3 (no action)], rounded to 4 decimal places, comma separated).
📗 [5 points] Enter the feature matrix of 300 training items from a single training episode (300 lines, \(4 + k\) numbers in each line, comma separated). You can use the same one from Part 1 to get the points for this question, but you should train your policy network first, then use the demo to produce the training items instead.
📗 [10 points] Enter the actions based on behavior policy provided in the instructions (not your network) for the 300 training items from the previous question. (300 numbers in one line, 0 (turn left), 1 (turn right), 2 (speed up), 3 (no action), comma separated).
📗 [5 points] Enter the set of weights (biases in the last row) of your network after training on your training set to clone the behavior policy (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(4 + k + 1\) by \(h_{1}\), second matrix should be \(h_{1} + 1\) by \(h_{2}\), and the last matrix should be \(h_{2} + 1\) by \(4\), all numbers rounded by 4 decimal places).
📗 [10 points] Use the network from the previous question on the feature matrix you provided, and output the stochastic policy (300 lines, 4 numbers in each line, [probability of 0 (turn left), probability of 1 (turn right), probability of 2 (speed up), probability of 3 (no action)], rounded to 4 decimal places, comma separated).
📗 [15 points] Find the actions with the highest probability based on the stochastic policy in the previous questions. The actions should be the same as the ones from your behavior policy. This question is graded based on the consistency with your stochastic policy from the previous question and the consistency with your behavior policy.
📗 [1 points] If you are not planning to participate in the competition, enter "0" or "none" for this question to get the point. If you are planning to participate in the competition, attach the text file you are planning to submit to Canvas to make sure if your submission has the correct format.
📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the question that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 Warning: grading may take around 10 to 20 seconds. Please be patient and do not click "Grade" multiple times.
📗 You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself . You can also include the resulting file with your code on Canvas Assignment CP2.
📗 The competition file should be submitted to the Canvas Assignment CP2 Competition in a text file named "CP2.txt" (please do not use a different file name).
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##a: 10" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 Saving and loading may take around 10 to 20 seconds. Please be patient and do not click "Load" multiple times.
📗 No sample solutions will be posted for these assignments.
📗 You are allowed to use code from other people (with their permission) and from the Internet, but you must and give attribution at the beginning of the your code. You are allowed to use large language models such as GPT4 to write parts of the code for you, but you have to include the prompts you used in the code submission. For example, you can put the following comments at the beginning of your code:
% Code attribution: (TA's name)'s A10 example solution.
% Code attribution: (student name)'s A10 solution.
% Code attribution: (student name)'s answer on Piazza: (link to Piazza post).
% Code attribution: (person or account name)'s answer on Stack Overflow: (link to page).
% Code attribution: (large language model name e.g. GPT4): (include the prompts you used).
📗 You can get help on understanding the algorithm from any of the office hours; to get help with debugging, please go to the TA's office hours. For times and locations see the Home page. You are encouraged to work with other students, but if you use their code, you must give attribution at the beginning of your code.