📗 Regular component (out of 5) should be submitted using the "Grade" and "Submit" buttons at the bottom of the page.
➩ Submission of the text file generated by the auto-grader to Canvas Assignment A2 is optional.
➩ Due date: August 9, no submission after that will be accepted.
📗 Competition component (out of 5) text file generated using Question 9 "Generate" button should be submitted to the Canvas Assignment A2C: Link
➩ Submission of an incorrectly formatted text file and any additional files to A2C will result in a competition score of \(-\infty\).
➩ Due date: July 7, no submission after that will be accepted under any circumstances.
📗 Note: Canvas A2 and A2C due date is the recommended due date, early submissions of competitions before the recommended due date will participate in trial competitions with the option to keep the score (not ranking).
📗 Hint: example submissions, discussion session schedules, and group recommendations (very different for different assignments) can be found on Piazza: Link.
📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10a25
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could either copy and paste or load your program outputs into the text boxes for individual questions or print all your outputs to a single text file and load it using the button at the bottom of the page.
📗 Please do not refresh the page: your answers will not be saved.
📗 You can write the code in any programming language and using any large language models. You do not have to submit your code.
📗 (Introduction) In this project, you will use supervised learning (or imitation learning) techniques to train an autonomous vehicle to move on a continuous state space, similar to this project: Link, Link or Link. You will submit a policy network (the neural network that produces the actions based on the state of the environment) that outputs a deterministic Markov policy ("turn left", "turn right", "speed up" or "no action" given the state of the vehicle, including position, velocity and distances to walls or other vehicles observed by the sensors). The neural network should be fully connected with two hidden layers (ReLU activation, a maximum of 100 units in each layer) and input layer with \(k\) units, and output layer with 4 units (softmax activation), where \(k\) is the number of sensors in front of the car, an odd integer between \(1\) and \(35\) (that is \(k\) must be \(1, 3, 5, 7, ..., 35\)).
📗 (Part 1) Given a neural network with random weights, make sure you can produce the correct policy to control the vehicle.
📗 (Part 2) Train your network to replicate a simple behavior policy for a simple environment without other vehicles. The vehicle has 5 sensors: (i) if one of the two sensors on the left has the highest value, turn left; (ii) if one of the two sensors on the right has the highest value, turn right; (iii) if the sensor in the middle has the highest value, speed up. Tie break rule for the highest value: speed up > left > right.
The test set for Part 1 and Part 2:
Each row is ["action", "x", "y", "vx", "vy", "s1", "s2", ..., "sk"], and "action" should be consistent with the simple behavior policy in Part 2, so you can either use it for training or use an actual simulator (below) to generate your training set (click on the camera icon or the checkbox in front of "Collect").
Neural network to control the car: , Player icon:
Data set: Collect (max 1000) every frames. 1
The second car is using the simple behavior policy described in part 2. In this demo, you can enter your network to control your car, or use the arrows at the top, or use arrow keys (or wasd) to control your car and collect data.
You can also simulate the environment yourself (not recommended) using the following parameters:
➩ The cars are only allowed to move inside [0, 1] x [0, 1] region.
➩ Crashing with the walls or other vehicles will cause the car to stop for 500 frames, and the speed to decrease to 12.5% of the original speed.
➩ The action "speed up" will increase the speed by 2.5% per frame.
➩ The actions "turn left" and "turn right" will decrease the speed by 1.25% per frame, and change the angle (in radians) by 0.01.
➩ The speed is clamped between 0.001 and 0.1.
➩ The total distance is computed by summing up the lengths of the line segments connecting the positions between every two frames (it should be approximately proportional to the sum of lengths of the velocities too).
📗 (Competition) Submit a policy network to compete in a random environment with other students in 3 teams. Game theoretic considerations can be made to modify your policy (through retraining with different training sets). You can optionally submit two policy networks and the maximum score from the two will be used in determining your ranking (the second vehicle can be designed to crash into other players and slow them down too).
Note on teams:
➩ Cars in the same team will not crash into each other, but cars in different teams will.
➩ Each race will has 5 players from each team, the selection will be based on your player ID, i.e. if you want to race with a specific student in your team, the two of you should use the same player ID (perhaps different player icons so you can see who is who when we run the competition).
If you use a network with \(k\) sensors, your score will be the total distance \(d\) traveled within 5000 frames, that is,
➩ Your score will be higher if your speed up more often.
➩ Your score will be higher if you crash fewer number of times.
➩ Your score will decrease by \(200\) for every two sensors you add (the behavior policy should lead to around \(10000\) distance without other competitors).
Your project grade is based on your submission to this assignment (out of 5) plus your ranking in the class (out of 5):
Top 20% gets 5/5.
Next 20% gets 4/5.
Next 20% gets 3/5.
Next 20% gets 2/5.
Next 20% gets 1/5.
(The students who do not participate in the competition will be given scores of negative infinities when computing the rankings).
📗 [1 points] Enter the feature matrix of the 300 test items (300 lines, \(k\) numbers in each line, comma separated).
➩ Note: you can find the test items under Part 1 and 2 of the Instructions, each item is given as ["action", "x", "y", "vx", "vy", "s1", "s2", ..., "sk"], the features are ["s1", "s2", ..., "sk"], you can copy them for this question, and the remaining values are just for you to compute the score.
📗 [1 points] Enter a random set of weights (biases in the last row) of your network in the correct format (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(k + 1\) by \(h_{1}\), second matrix should be \(h_{1} + 1\) by \(h_{2}\), and the last matrix should be \(h_{2} + 1\) by \(4\), all numbers rounded by 4 decimal places).
📗 [10 points] Use the network from the previous question on the feature matrix you provided, and output the stochastic policy (300 lines, 4 numbers in each line, [probability of 0 (turn left), probability of 1 (turn right), probability of 2 (speed up), probability of 3 (no action)], rounded to 4 decimal places, comma separated).
📗 [10 points] Enter the actions based on behavior policy provided in the instructions (not your network) for the 300 training items from the previous question. (300 numbers in one line, 0 (turn left), 1 (turn right), 2 (speed up), 3 (no action), comma separated).
📗 [5 points] Enter the set of weights (biases in the last row) of your network after training on your training set to clone the behavior policy (three matrices separated by -----, each matrix has rows separated by lines, columns separated by commas, the first matrix should be \(4 + k + 1\) by \(h_{1}\), second matrix should be \(h_{1} + 1\) by \(h_{2}\), and the last matrix should be \(h_{2} + 1\) by \(4\), all numbers rounded by 4 decimal places).
You can test your above network or generate more training items (each line is ["action", "x", "y", "vx", "vy", "s1", "s2", ... "sk"]). The dark red lines represent how the sensors compute the distances for the vehicle you control. To collect data points, enter your network, the number of data points and number of frames to skip, then click on the camera icon or check the "Collect" checkbox to start.
📗 [10 points] Use the network from the previous question on the feature matrix you provided, and output the stochastic policy (300 lines, 4 numbers in each line, [probability of 0 (turn left), probability of 1 (turn right), probability of 2 (speed up), probability of 3 (no action)], rounded to 4 decimal places, comma separated).
📗 [15 points] Find the actions with the highest probability based on the stochastic policy in the previous questions. The actions should be the same as the ones from your behavior policy. This question is graded based on the consistency with your stochastic policy from the previous question and the consistency with your behavior policy.
The following is the full simulator, where you can add multiple other cars:
Number of sensors (or the network controlling it): , Player icon:
Data set: Collect (max 1000) every frames.
Number of other cars (or networks to control them, separated by "====="):
Race track:
Game track: Create tracks
➩ You can create track that is a -sided regular polygon with width using this button: and click the "Restart" button.
➩ You can also try one of the F1 tracks: Link. You can create the track based on the following track centers with width using this button: and click the "Restart" button:
📗 [1 points] Please list the AI tools and references you used and the names of other students and course staff you discussed the assignment or competition with. Please also enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading. If you completed the assignment without any help (not recommended), please enter "None" and do not leave this question blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##a: 2" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 Saving and loading may take around 5 to 10 seconds. Please be patient and do not click "Load" multiple times.
📗 Presentations and interviews are optional for the competitions.
📗 If your competition grade is 2, 3, or 4, you can book an interview with the TA for 15 to 30 minutes.
📗 Interviews can only be booked during discussion sessions on Zoom (either during the current discussion session or for a future date and time): Link. Please do not email/spam the TA.
📗 A maximum of 3 interviews can be booked per person, and in the case you need 1 point for the next letter grade, we will allow a 4th one after the final exam.
📗 During the interviews, you will give a 5 to 10 minutes presentation to explain anything you did on the project that is creative or technically challenging. Then you will answer three technical questions about your presentation or any materials related to the assignment.
➩ If you answer any one of the three questions incorrectly, you will get \(-1\).
➩ If you answer all questions correctly, and if your presentation ideas are correct, interesting, consistent with your submissions, and not done by many other students (we will make the decision after all interviews are done), you will get \(+1\).