Connect 4 - A Learning Experience
Jason Fletchall, Mario Giombi, Brian Schuette
Introduction
We set out to create a program that played Connect 4. The program would learn from games it had played, and over time it would get smarter and improve its playing - avoiding moves that led to losses and choosing moves that led to wins.
We decided on this project for a few reasons. The main reason was that it is not hard to create a program that plays and wins Connect 4, and we will discuss those later on. Rather, we decided to take a different approach. Instead of creating a program that would always win, we decided to create a program that may not win all the time, or even very often, but it will learn. As it inevitably loses games, it will add those to its memory of past games. Eventually it will learn how to turn a losing combination of moves into a winning one by changing around a few of its choices. Again this is different from a program that would always win, we wanted to see whether it is possible for a program to learn and improve, even on a simple game like Connect 4.
Similar Work
When we planned this program, we looked around to see if something similar had been done.
Human vs Human Connect 4: http://javascript.internet.com/games/connect-4.html
The first implementation we found was a simple human vs human Connect 4 game. It is fun to play but has no bearing on our program since it does not have a computer player.
Human vs Computer Connect 4: http://www.mathsisfun.com/games/connect4.html
This is a very slick human vs computer game of Connect 4. The interface is very pretty and flashy. However the computer player is not very smart, it misses even simple blocks or wins sometimes. We hoped that our player would be smarter than this after some training.
Minimax Connect 4: http://www.geocities.com/ResearchTriangle/System/3517/C4/C4Conv.html
Finally, we found a game where a human player plays against a computer player using the Minimax algorithm. There is even an option where you can choose what depth you want minimax to search to, and at higher depths it is very smart. This implementation is not very difficult, and is actually quite common, which is why we did not plan to do our program this way. The hope is that after playing many times against humans, our program would learn from them and over time would win a greater percentage of the games.
Approach and Early Pitfalls
Once we decided on what we wanted to do, now we had to come up with a way to do it. We knew from the start that we would use some sort of decision tree to store the data, although we agonized over how we would be able to store all of it. Our first idea was that each node of the tree would be the game state, however we soon realized that if we implemented it that way, the amount of space used for the file would be enormous.
Another issue we had was how to search through the tree to find the best path. We didn't want the computer taking too long for its moves, so some sort of searching algorithm would take too much time as the tree increased in size. The third problem we had early on was that none of us knew how to implement a GUI, and so we had to learn before we could start on the program.
Final Implementation
As described above, we had some early difficulties in figuring out how exactly we were going to program this thing. We eventually came up with three classes that we would use - A GUI class, a Player class, and a Controller class. The GUI class would control the GUI and take in the human player's input (from a mouse click). The Player class would control the computer player. Specifically, it would store and load information and choose the computer player's next move. And the Controller class would keep track of all moves and enforce the rules of the game. The specific implementations are as follows:
GUI: As mentioned, none of us knew how to create a GUI, but we definitely wanted to make our program look sharp and make it easy to use, so we taught ourselves how to use the javax.Swing package. Therefore, everything in the user interface is a Swing object. First off, the entire window is contained in a JFrame. During a human player's turn, the frame is enabled, allowing the human to make a move. During a computer player's turn, however, the frame is disabled, so a human cannot attempt to make a move when it's not his turn. Next, a JMenuBar contains one JMenu, the File menu, allowing the user to start a new game (reset everything) or quit the game. The actual game board consists of a 6x7 array of JButtons, with each button representing a slot in the board. Each button has a set of Icons, loaded from GIF files, that appear when the button is enabled, disabled, or rolled-over with the mouse. A button is enabled only when the slot it represents is the lowest open slot in its column. Otherwise, it is disabled (cannot be clicked on). Once all of this has been set up, the GUI waits for a click from either a human (on a human's turn) or a computer (on a computer's turn) (Note that a computer player's move generates a "click" just like a mouse click). When a click happens, the GUI changes the clicked button's icons, disables the button, then notifies the game controller of the move and asks the controller if anybody has won or if the game is tie. If so, the GUI displays an appropriate message (a JMessageDialog) and marks the winning checkers with new icons.
Controller: The Connect 4 game controller was fairly straightforward to write and does exactly what would be expected of a game controller. The controller sets up the game board and keeps track of moves. It asks the GUI for human player moves and it calls computer player's move routines. The computer player interacts exclusively with the game controller, which provides public getBoard and isLegalMove functions to assist in its move-making. As the game progresses, the controller checks for winners and sets public boolean variables to notify the GUI/computer player of a win. In our setup, the GUI is actually the program that is run first. It then instantiates the game controller and calls controller.human_move, passing in the row and column that the human player moved into. When it is the computer player's turn, the GUI calls controller.computer_move, and the controller notifies and asks for a computer move. When a player wins or a tie occurs, the controller sets boolean variables that notify the GUI, which then displays the end-of-game message. The GUI then provides the option to start a new game or to quit. When the GUI exits, the game controller notifies our computer player, so that it can save what it has learned throughout this sequence of games.
Player: The player class uses a tree to store information. Each node represents a move, and so each node has 7 children, corresponding to the column moves from that board state. For example, child 0 of a node would mean that the next move from that node was into column number 0. So each move in the game is another level of the tree. In this way, we avoided saving the board state in each node which saves a lot of space. Each node also has a data element - an integer. Its value corresponds to the possibility of winning from this board state. The player who goes first (black) has a value of +1 and the second player (red) has a value of -1. So if a node has value 4, that means that of all the possible combinations from this node, the black player will win a net of 4 games. So the black player would want to go down this node. If the value of a node is 0, that means that neither player improves their chances of winning by taking it. We avoided the problem of having to search through the tree by having a value propagate upwards once the game is over. So if the red player wins the game, the terminal node gets value -1, and then that -1 propagates upwards. A non-leaf node has value equal to the sum of the values of its children. After the game is concluded for good, the data is stored in a file, using indentations to determine the child - parent relation.
Result
This is the interface for our program. You click on the empty squares to move into it. A finished game looks like this:
So as you can see, everything looks fine and works fine. The next step was to play against it. And indeed, as you played against it and won in certain ways, you can see that it is definitely learning. For example, if you use a combination of moves that beats the computer, the next time you try that it will try a different move. If it loses again, it will try something different. And so on, until it finds a way to win, which in that case it will stick to the winning combination of moves and will win if you keep using the same combination of moves. This is not quantifiable in any way, but still interesting to see.
So we needed a way to come up with a quantifiable result - to show that the program is actually learning and getting smarter over time.
A little aside on how our player works. It tries to take moves from past experiences. However if it can't find anything, it will then see if it can win on this move and take it. If not, it will see if it can block on this move and take it. If not, it will move random.
We also created a DumbPlayer class that was in essence like our program, but did not learn. So if it could win it would take it, else if it could block it would take that, and if not it would take a random move.
Control Experiment 1
The first thing we needed to do was a control experiment. So we created a class that only made random moves. We then played it against itself to test whether our game was truly random. We did come up with an interesting observation - assuming equal skill, the player who goes first has a 55% chance of winning. So we had to go back and account for that by switching the first and second players after each game. The result was:
10000 games played
Player 1 wins: 5007
Player 2 wins: 4966
So everything seems to be working fine here.
Control Experiment 2
Next, for another control experiment, we tested our smart learning player (player 2) against the DumbPlayer (player 1) which was discussed above. We turned off learning, so our player did not learn anything from these moves. After 1000 games, the result was:
1000 games played
Player 1 wins: 445
Player 2 wins: 510
We then trained our player against the DumbPlayer for 5000 games.
Then we did another 1000 game test, again with learning turned off (but it can still read from previous results). The results of that were:
1000 games played
Player 1 wins: 466
Player 2 wins: 481
So our player didn't learn anything, the small +31 win change is well within the margin of error for this.
Learning Experiment
So to really show learning, we have to play against a relatively smart program. So we created a Minimax Connect 4 player.
We then played our smart learning player (player 1) with learning turned off, against the Minimax player (player 2). The results were what you would expect - the Minimax program annihilated our program:
100 games played:
Player 1 wins: 32
Player 2 wins: 65
Next we trained our player. We did not want to train it against Minimax, because of the problem of overfitting. So we trained it against the DumbPlayer from above. Recall, the DumbPlayer would take a win if it saw it, else it would take a block if it saw it, else it would take a random move. We trained it against DumbPlayer for 5000 games.
Then we played against Minimax again, with learning turned off. We would see whether it actually has learned anything or not. And the results were unexpected:
100 games played:
Player 1 wins: 98
Player 2 wins: 2
Our program went from winning a third of the games, to winning practically all of the games!
Therefore we think it is safe to say that against an intelligent program, our Player does demonstrate learning and in that, our project is a success.
Appendix A: Project Code
A link to the zip file containing all our code. Instructions below in Appendix B.
Appendix B: User Manual
Connect4.java is the file that starts the game. Compile all of the .java files, and then run Connect4.java as follows:
java Connect4 <time_limit> <game_type>
time_limit is the maximum time
allowed, in milliseconds, for the computer player to make each move. The
computer is pretty quick so even a time of 10 will work out (and is
recommended), however lower than that and you might start getting errors,
cutting off the computer before it has time to do anything.
game_type is how you want to play: "hh" for human vs. human, "hc" for human vs. computer (learning player), and "hm" for human vs. minimax player
When you first start it up, it will create a file named connect4data.txt. This is the data file where the computer player stores all its moves. So if you want to "reset" its memory, delete the file.