📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10,11a7
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The official deadline is August 12, late submissions within one week will be accepted without penalty.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could either copy and paste or load your program outputs into the text boxes for individual questions or print all your outputs to a single text file and load it using the button at the bottom of the page.
📗 Please do not refresh the page: your answers will not be saved.
📗 You should implement the algorithms using the mathematical formulas from the slides. You can use packages and libraries to preprocess and read the data and format the outputs. It is not recommended that you use machine learning packages or libraries, but you will not lose points for doing so.
📗 (Introduction) In this programming homework, you will use genetic algorithm to train a neural network to control a simplified version of Flappy Bird (Wikipedia, similar to this project: Link. Your neural network will have two inputs (horizontal and vertical distances to the center of the next obstacle or pipe), and one output (whether to flap). You should use a single hidden layer and you can decide the number of hidden units.
📗 (Part 1) Make sure you can simulate the environment correctly. The obstacles (pipes in the original game) are \(h\) = ? units apart, with a gap of \(g\) = ? units for the bird to fly through at a random position between \(0\) and \(100\). For simplicity, assume the horizontal thickness of the obstacle is \(0\) (this makes the game and the geometry significantly easier). The birds move down (due to gravity) by \(d\) = ? units, moves up (when the action flap is used) by \(u\) = ? units, and forward \(1\) unit every frame. For simplicity, you can allow the birds to fly above the top or below the bottom of the screen. The initial position of the bird should be \(\left(0, 50\right)\).
📗 (Part 1) Manually create a policy function to generate a training set to train a few neural networks using gradient descent. You can create multiple different training sets or randomly sample subsets from a large training set to get different networks. The network should have 2 input units, \(n\) hidden units (only one hidden layer), and 1 output unit. All units have have logistic activation. You can choose any value for \(n\), larger than \(4\), and you can choose how to fit the network (cost function, learning rate, stopping criterion, etc).
📗 (Part 2) Start with the random networks from Part 1 and compute the total distance traveled minus the distance to the center of the obstacle. Use this value as the fitness measure for genetic algorithm.
📗 (Part 2) Cross-over the networks by randomly swapping weights and biases of the networks. Choose the cross-over probabilities based on the fitness measures. You can allow cross-over between two copies of the same network (which means the same network will be in the next generation).
📗 (Part 2) Randomly mutate the networks with small probabilities by multiplying a dividing the weights and biases by a random number between 0 and 0.5.
📗 (Part 2) Repeat the process many times until the best network can pass through all obstacles.
You can play a simulation of the game environment here (or use it to generate sample data):
0
Click to restart the game (and clear data):
Distance to next obstacle: horizontal: , vertical:
Score: current distance: , fitness (after game ends):
Obstacle centers:
Features: horizontal: , vertical:
Actions:
Combined data (row 1 is feature 1, row 2 is feature 2, row 3 is action):
📗 Note: if you are interested in reinforcement learning, you can also train the neural network using policy gradient methods similar to Link.
📗 [5 points] Given the following centers for the obstacles and the sequence of actions (0 means no flap, 1 means flap), compute the input features (horizontal and vertical distances to the center of the next obstacle) for every frame: \(t\) lines, \(2\) integers each line. Assume the first center \(50\) is at x-position \(0\) so the first feature pair is always \(\left(0, 0\right)\).
➩ Centers:
➩ Actions:
Hint
📗 The bird should start at \(\left(0, 50\right)\) and if the current position is \(\left(x, y\right)\), then action 1 will move the position to \(\left(x + 1, y + u\right)\), and action 0 will move the position to \(\left(x + 1, y - d\right)\).
📗 Suppose the next center is at \(\left(c_{x}, c_{y}\right)\), then the feature values should be \(\left(c_{x} - x, c_{y} - y\right)\). The answer to this question.
📗 You can find the values of \(h, g, u, d\) in the instructions.
📗 The first few lines of your answer should be the following:
📗 [2 points] Train a neural network (either gradient descent or some machine learning package) to fit the action sequence from Question 1. Enter the first layer weights here: \(3\) lines, \(n\) numbers each line, rounded to 4 decimal places, first line for feature 1 weights (horizontal distance), second line for feature 2 weights (vertical distance), and the last line contains the bias terms.
Hint
📗 See the MNIST assignment.
📗 You can decide the number of hidden units \(n\), but it should be at least \(4\).
📗 [2 points] (Continue from Question 2) Enter the second layer weights here: \(n + 1\) numbers in one line, rounded to 4 decimal places, the last number is the bias for the output unit.
📗 [10 points] Evaluate your network from Question 2 and Question 3 based on the obstacle centers from Question 1. If you trained your network correctly, your answer to this question should be the same as the action sequence in Question 1 (minor differences is okay). \(t\) integers (0 or 1) in one line, \(t\) is the length of the actions vector in Question 1 (i.e. compute the actions even after the bird hit an obstacle).
📗 [5 points] Compute the fitness value of the above action sequence. Enter a single integer.
Hint
📗 The fitness is the x-distance traveled before hitting a pipe minus the absolute y-distance to the center of the pipe. If the current position of the bird is \(\left(x, y\right)\) and the center of the pipe is \(\left(c_{x}, c_{y}\right)\) where \(c_{x} = x\), then the fitness is \(x - \left| y - y_{c} \right|\).
Answer:
You can plot the path of your action sequence using .
📗 [2 points] Use genetic algorithm to train a network and find the best network in the last iteration. Enter the first layer weights here: \(3\) lines, \(n\) numbers each line, rounded to 4 decimal places, first line for feature 1 weights (horizontal distance), second line for feature 2 weights (vertical distance), and the last line contains the bias terms.
Hint
📗 Start with \(N\) neural networks with random weights, or the random perturbation of the neural networks from Questions 2 and 3.
📗 Compute the fitness of the neural networks \(f_{i}\), and the reproduction probability: \(\dfrac{f_{i}}{f_{1} + f_{2} + ... + f_{N}}\).
📗 Randomly select two networks based on the reproduction probabilities, and swap the weights and biases of the networks (there are many ways to cross-over, one example is to flatten all weights and biases to a long vector, choose a random position, and swap the weights and biases of the two networks after that position).
📗 Randomly mutate each of the resulting networks (mutation probabilities should be small, and there are many ways to mutate, multiplying or dividing by a random number between 0 and 0.5 is one example, but you can also try adding or subtracting a random number).
📗 [2 points] (Continue from Question 2) Enter the second layer weights here: \(n + 1\) numbers in one line, rounded to 4 decimal places, the last number is the bias for the output unit.
Hint
📗 See Question 6.
You can simulate the game using your network here:
0
Click to restart the game (and clear data):
Distance to next obstacle: horizontal: , vertical:
Score: current distance: , fitness (after game ends):
Obstacle centers:
Features: horizontal: , vertical:
Actions:
Combined data (row 1 is feature 1, row 2 is feature 2, row 3 is action):
📗 [10 points] Evaluate your network from Question 6 and Question 7 based on the obstacle centers from Question 1. \(t\) integers (0 or 1) in one line, \(t\) is the length of the actions vector in Question 1 (i.e. compute the actions even after the bird hit an obstacle).
📗 [20 points] Compute the fitness value of the above action sequence. Enter a single integer. This question is worth 20 because it is graded based (1) consistency with the previous 3 questions, (2) performance of your network, the higher the fitness value, the higher your grade.
Hint
📗 Same as Question 5.
Answer:
You can plot the path of your action sequence using .
📗 [1 points] Please confirm that you are going to submit the code on Canvas under Assignment A7, and make sure you give attribution for all blocks of code you did not write yourself (see bottom of the page for details and examples).
📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the question that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 Warning: grading may take around 10 to 20 seconds. Please be patient and do not click "Grade" multiple times.
📗 You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself . You can also include the resulting file with your code on Canvas Assignment A7.
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##a: 7" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 Saving and loading may take around 10 to 20 seconds. Please be patient and do not click "Load" multiple times.
📗 The sample solution in Java and Python will be posted on Piazza around the deadline. You are allowed to copy and use parts of the solution with attribution. You are allowed to use code from other people (with their permission) and from the Internet, but you must and give attribution at the beginning of the your code. You are allowed to use large language models such as GPT4 to write parts of the code for you, but you have to include the prompts you used in the code submission. For example, you can put the following comments at the beginning of your code:
% Code attribution: (TA's name)'s A7 example solution.
% Code attribution: (student name)'s A7 solution.
% Code attribution: (student name)'s answer on Piazza: (link to Piazza post).
% Code attribution: (person or account name)'s answer on Stack Overflow: (link to page).
% Code attribution: (large language model name e.g. GPT4): (include the prompts you used).
📗 You can get help on understanding the algorithm from any of the office hours; to get help with debugging, please go to the TA's office hours. For times and locations see the Home page. You are encouraged to work with other students, but if you use their code, you must give attribution at the beginning of your code.