October 31, 1995 -- Game search For a single agent in a relatively non-hostile world, the search for the path from some start state to some goal state is not especially difficult, and in fact, it's sort of dull. But the world isn't always peaceful---sometimes, there are other agents out there trying to keep you from reaching your goal, and at the same time those other agents are trying to achieve some goal of their own. We see this kind of stuff everywhere: in the executive boardroom, on the athletic field, sometimes even in the classroom. And when we try to model this kind of competitive behavior on a computer, we have to keep in mind that while our computerized "good guy" is going to try to move toward the goal in an optimal a fashion as possible, the computerized "bad guy" is going to everything it can to keep us from getting there. Thus, the question in a competitive or adversarial situation is no longer "what's my optimal path to the goal?", but is instead "what's my path to the goal when someone else is trying stop me?" The fundamental change in the nature of that question results in a fundamental change to the way we do state-space search in adversarial situations, thus giving rise to something called "adversarial search". And since we frequently use this kind of search when we build intelligent game-playing programs, this kind of search is frequently called "game search". The principle of game search is to first generate the state space (or "game tree") some number of levels deep, where each level corresponds to one player's move (or, more accurately, the set of all moves that the player could possibly make at that point). After generating the state space for that number of levels, the nodes at the bottom level are evaluated for goodness. (In the context of game playing, those nodes are often called "boards", each one representing one possible legal arrangement of game pieces on the game board.) The estimate of the goodness of a board is a little bit different than before, but not much. Since we have to worry about the opponent, we set up our estimation function so that it returns a spectrum of values, just like before, but now the two extremes are boards that are great for us (i.e., we win) and boards that are great for our opponent (i.e., we lose). We apply our estimation function to those lowest level boards, and propagate the numerical values upward to help us determine which is the best move to make. The joy of hex In order to explore this wild and wacky world of adversarial or game search, we're going to have to introduce a game. It's a simple game for two players, and it's called hexapawn, for reasons which will become obvious. The rules of hexapawn (at least in it's original form), are as follows: The game is played on a 3x3 board. Each player begins with three pawns lined up on opposite sides of the board. There are three white pawns and three black pawns, which gives us a grand total of six pawns, hence the name hexapawn. White always moves first, just like in chess. The players take turns moving their pawns. A pawn can move one square forward to an empty square, or it can move one square diagonally ahead (either to the left or right) to a square occupied by an opponent's pawn, in which case the opponent's pawn is removed from the board. One player wins when one of these three conditions is true: 1) one of that player's pawns has reached the opposite end of the board, 2) the opponent's pawns have all been removed from the board, or 3) it's the opponent's turn to move but the opponent can't move any pawns. It's not a very exciting game for human players, but it's reasonably stimulating for humans who are required to write programs to get computers to play this game, such as yourselves. (It's not entirely clear how the computers feel about it.) Hexapawn also serves as a very nice mechanism for demonstrating the principles of game search with heuristics. Hexapawn: catch the fever Let's look at the beginning of a sample game of hexapawn and see how we might get a computer to play the game. We'll give our opponent take the side of the white pawns, and we'll play the black pawns. The initial board configuration will be represented like this: W W W - - - B B B As we said, white always gets to move first. This gives white three possible initial moves, which are represented in this way: W W W - - - B B B / | \ / | \ / | \ / | \ / | \ / | \ - W W W - W W W - W - - - W - - - W B B B B B B B B B But of course, white doesn't get to make all three moves. White has to choose one and go with it. So let's say that white opts for that move on the left. Our resulting state space then looks like this: W W W - - - B B B / / / / / / - W W W - - B B B Everything was fine up until now. Now it's our turn. What will we do? Well, what we'd like to do is look at all of our possible next moves and make the best one, right? Sure. So let's see what our options are on this move: W W W - - - B B B / / / / / / - W W W - - B B B / | \ / | \ / | \ / | \ / | \ / | \ / | \ - W W - W W - W W B - - W B - W - B B - B B - B B B - Can we tell from just this which possible next move is the best one? Maybe. That one on the left looks sort of nice, since it leaves us with one more pawn than our opponent. But we really can't tell just by looking at these different boards which move is likely to lead to a win for us. Maybe we could get a better idea of which of our three possible moves is the best one by looking even further ahead to see what white might do on the next turn: W W W - - - B B B / / / / / / - W W W - - B B B / | \ / | \ / | \ / | \ / | \ / | \ / | \ - W W - W W - W W B - - W B - W - B B - B B - B B B - / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - ^ ^ | | white white wins wins Well now, maybe we know a little more than before. In fact, we can see two boards there that indicate victory for our opponent, and we could probably make some reasonable attempts at estimating how close the other boards might be to either a win or a loss for us. However, we could also look ahead yet another move, and then another, and so on until we had mapped out all the possibilities. The problem with doing this is that it's going to cost us lots of computational resources. This may not be a big deal when we're playing hexapawn with only three pawns on a side, but it will be a big deal if we extended the game to eight pawns on a side, for example. Or maybe instead of hexapawn variations, we're playing something like chess. Now the computational expense will be far too prohibitive, so we're going to have to settle on some arbitrary cutoff for looking ahead in this or any game. Since I'm running out of room to display all the possible boards at the same level, let's make life easy for me and set our arbitrary cutoff for looking ahead at two levels or two moves ahead. The static board evaluation function Above it was noted that two of those bottom-level boards were wins for white. But what about those other boards? What do they indicate for us? Will they lead to wins or losses for us? How can we estimate that? How can we get a computer to estimate that? Providing that estimate is the job of something called the "static board evaluation function". A static board evaluation function takes as input the current state of a game (i.e., the board) and returns a value corresponding to the "goodness" of that current state or board. By "goodness" we mean how close that board is to a victory for us---the closer, the better. A simple static board evaluation function might return, say, a positive number if the board is good for us, a negative number if the board is not good for us (but is consequently good for our opponent), and maybe a zero if the board is neither bad nor good for either player. How might we design such a function? Here's a weak first attempt at one. Since we're playing on the black side of the board, we'll have the function return a +10 if the board is such that black wins. And we'll have it return a -10 if white wins. (If we were playing on the white side of the board, we'd want it to return a +10 if white won, and a -10 if black won.) Since we win if we can get one of our pawns across the board to the other side, we should have the function take that into account too. So if neither side has won, let's have our function return the number of black pawns with clear paths in front of them minus the number of white pawns in front of them. Oh, and since we win if our opponent's pawns are all removed from the board, let's have the function incorporate that. We'll have the function count the number of black pawns on the board, subtract the number of white pawns, add that number to the previous number, and return that result. There, that wasn't so bad, was it? The minimax procedure Now that we have a reasonably well-defined static board evaluation function, how do we use it? Remember that the idea behind creating this thing was to estimate the "goodness" of a board---in this case, the function returns a positive number when the board is good for us, and a negative number when the board is good for our opponent. Let's apply the function we defined above to all the boards at the bottom level of the hexapawn state space we generated way back up there: W W W - - - B B B / / / / / / - W W W - - B B B / | \ / | \ / | \ / | \ / | \ / | \ / | \ - W W - W W - W W B - - W B - W - B B - B B - B B B - / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - 0 1 1 -10 -1 -10 0 -1 The two boards that have been assigned a value of -10 are, of course, the boards that represent victories for white. In the case of the leftmost of those two boards, it's a victory for white because it's now our turn (i.e., black's turn) and we can't move any of our pawns. The rightmost of these two boards is a victory for white because white has moved one of its pawns all the way across the board. But let's take a look at another board. The one at the very left, for example, has been given a value of zero. Yet it's easy for us to see that, since it's our turn, and we only have one black pawn that we can move, if we just move that one black pawn forward one space, we'll have blocked any possible move by white and we win the game. So why isn't that board given a value of +10? Because in order to figure that out, our static board evaluation function would have to look ahead one more move. But a static board evaluation function is exactly that---static. It doesn't look ahead. If we set a limit on the number of moves we want to look ahead in order to play the game in a reasonable amount of time, but then we have our board evaluation function look even further ahead, we're going to eat up additional computing resources that we were trying to save, and we're also going to end up writing the same code twice. So there's absolutely no advantage to having the board evaluation function look ahead an additional move or two or three--- instead, we should just readjust our original depth cutoff so that it allows us to look more moves into the future. Now let's go back and look at those bottom two levels in our hexapawn state space: - W W - W W - W W B - - W B - W - B B - B B - B B B - / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - 0 1 1 -10 -1 -10 0 -1 What can we do with those numbers that have been assigned to the boards? Those boards all represent possible results of a move by white. Those numbers can be used to tell us which of those moves white is more likely to make. For example, in the leftmost subtree, we might guess that white is more likely to make the move that results in a board with value 0 than the moves that result in boards with value 1, because a board with value 0 is better for white than a board with value 1, which favors us. That assumes, of course, that we trust our evaluation function. Similarly, in the middle and rightmost subtrees, white is going to prefer the moves that result in a board with a value of -10 (a victory for white), right? We can indicate those preferences by taking the minimum values among those board values in each subtree and propagating them up one level: - W W W - - B B B / | \ / | \ / | \ / | \ / | \ / | \ / | \ - W W - W W - W W 0 B - - -10 W B - -10 W - B B - B B - B B B - / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - 0 1 1 -10 -1 -10 0 -1 OK, now how can we use that information? We use it in almost exactly the same way as we did before. We can figure out which move we should make of the three that are available to us by finding the maximum of the values that we just propagated upward. One of those values was a 0 and the other two were -10. Of course, the 0 is better for us, so we'd choose to make that move. Let's go back and see what we've done here. First we started with some arrangement of pawns on the board and the knowledge that it was our turn. We generated all the moves we might make, and then we generated all the moves that our opponent could make after we made our move. We arbitrarily chose to look only two moves ahead, but we could have looked further if we wanted to give up the computational resources to do so. We then applied our static board evaluation function to the bottom-most boards (i.e., the terminal leaves on the tree that is our state space) and assigned a numeric value corresponding to "goodness" to each of those boards. Those bottom-most boards are each the result of a possible move by white. We assumed that white would always make the best move it possibly could, so we propagated the minimum values up from the leaves to the immediate parents. And then we assumed that we would want to make the best possible move that we could, and we chose that move by selecting the maximum of the values that had just been propagated upwards. Because we chose to look ahead only two moves, the first propagation was of minimum values from the very bottom level, followed by a propagation of maximum values upward from that level. If we had chosen to look ahead three moves, we'd first propagate maximum values from the bottom, then minimums, then maximums. If we were looking ahead four moves, we'd start with minimums, then maximums, then minimums, then maximums. And so on, and so on. The procedure that we just described has a name, "minimax", and it's the heart of game-playing computer programs. The minimax procedure relies on two assumptions. First, there must be an adequate and reasonably accurate board evaluation technique. It doesn't have to be perfect, but it does have to work more often than not. The second assumption is that the relative merit of a move becomes more obvious as you search deeper and deeper into the state space. Why? Because if that weren't so, there wouldn't be any value in doing the search in the first place. But keep in mind that for any given game, or at least any given implementation, one or the other (or both) of these assumptions may not be true. Once more, with feeling Let's continue the example for a bit. We have a hot hexapawn game going, in which white made a move: W W W start - - - B B B / / / / / / - W W white moves W - - B B B and then we (playing black) applied the minimax procedure to find our best move from the board that our opponent had left for us: W W W start - - - B B B / / / / / / - W W white moves W - - B B B / | \ / | \ / | \ our / | \ best / | \ move / | \ / | \ - W W - W W - W W 0 B - - -10 W B - -10 W - B B - B B - B B B - / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - 0 1 1 -10 -1 -10 0 -1 So we make our move, and then white counters with a move: W W W start - - - B B B / / / / / / - W W white moves W - - B B B / / / / / / / we - W W move B - - B - B | | | | white - - W moves B W B - B Note that white didn't make the move that we predicted would be the best move for white. That happens a lot, but we don't care. Regardless of the move that white made, we'd have to go through the whole minimax thing again to decide our next best move. So let's go through it one more time just to make sure we follow how this all works. We start with the board that was left after white's last move: - - W B W - B - B Then we generate the state space that results from all the moves we could make followed by all the moves that our opponent could make in response. (Remember that we've arbitrarily set our search limit at two moves ahead...we could have set that limit higher if we wanted to expend the resources.) - - W B W - B - B / ^ \ / / \ \ / / \ \ / / \ \ / / \ \ / / \ \ B - W - - W - - W - - W - W - B B - B B - B W B B - B - - B B - - B - - / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ - - - - - - - - - - - - - - W - - W B W - B B W B W - B B W B - B B - B - - B - - B B - - B - - W - - B W - Then we use our static board evaluation function to determine the goodness of the "terminal boards": - - W B W - B - B / ^ \ / / \ \ / / \ \ / / \ \ / / \ \ / / \ \ B - W - - W - - W - - W - W - B B - B B - B W B B - B - - B B - - B - - / \ / \ / \ 10 / \ / \ / \ / \ / \ / \ / \ / \ / \ - - - - - - - - - - - - - - W - - W B W - B B W B W - B B W B - B B - B - - B - - B B - - B - - W - - B W - 2 4 1 3 -10 -10 Then we propagate the minimums up from the result of white's move (this is the "minimizing level"): - - W B W - B - B / ^ \ / / \ \ / / \ \ / / \ \ / / \ \ / / \ \ B - W - - W - - W - - W - W - 2 B B - 1 B B - -10 B W B B - B - - B B - - B - - / \ / \ / \ 10 / \ / \ / \ / \ / \ / \ / \ / \ / \ - - - - - - - - - - - - - - W - - W B W - B B W B W - B B W B - B B - B - - B - - B B - - B - - W - - B W - 2 4 1 3 -10 -10 And then we would propagate the maximum value up and select the best move to make. In this case, that move would be the one on the left, with the board value of 10, which indicates a win for us. Yippee!! Alpha-beta pruning If you think about it for a minute, we didn't really have to go through all that board generation and evaluation and propagation of values to figure out what to do on that last move. As soon as we generated that move on the left side of the state space above and realized that it was a winning move (which is why we didn't generate any moves beyond that), there was no reason to generate the rest of the state space to the right. The first move that we looked at was so good that it didn't matter what the other possibilities were. We didn't recognize that because we were following our minimax procedure blindly. But if we make our minimax procedure a little bit smarter, we could reduce the cost of doing this search by "pruning" our state space tree and getting rid of some of the board generation, evaluation, and propagation, all of which eat up computational resources. Let's take a look at a simple abstract example of how this might work. Let's say we start with some board: start board And from that starting board, I have two possible moves. But instead of generating all my moves in a sort of breadth-first fashion, I'm going to fall back on my old depth-first search technique and generate just one of my moves, and explore all of my opponent's moves in response to my move before I go and look at my other move: start board / / / / / my move Now, again following my depth-first approach, and remembering that I'm still cutting off my search at two moves ahead, I look at one of my opponent's moves and apply the static board evaluation function: start board / / / / / my move / / / / / opp's move 2 Let's say that my opponent has two possible moves after either of my moves. We've just looked at one of the opponent's possible moves, now well explore the other: start board / / / / / my move / \ / \ / \ / \ / \ opp's opp's move move 2 7 I then propagate the minimum value up from that level, and begin to explore the possible outcomes of my other move: start board / \ / \ / \ / \ / \ 2 my my move move / \ / / \ / / \ / / \ / opp's opp's opp's move move move 2 7 1 The question now is "do I get any useful information from exploring my opponent's remaining possible move?" And the answer is "no". Why? Let's look at what could possibly happen here. If I generate that last remaining board and apply the board evaluation function to it, the value of that board is either going to be greater than or equal to 1, or it's going to be less than 1. In the former case, the value that will be propagated up from this level is 1, a value that I already knew. In the latter case, the value less than 1 would be propagated up, and I didn't know about that value already. But, and this is the important but, either of those values will be less than 2, which is the minimum value that was propagated up from the other side of the tree. So based on what I know from only exploring three of my opponent's four possible moves, I can determine that the fourth possible move will have no bearing on my decision about what move I should make. I know I'm going to choose the move to the left---the one where the worst my opponent can do to me is leave me with a board with a value of 2. I know that I'm not going to choose the move to the right, because my incomplete exploration of the state space has already convinced me that the best I can do if I go that direction is end up with a board that has a value of 1. Oh, sure, maybe there's a possibility that my opponent would do something stupid if I took that move to the right and leave me with a +10 board and I'd win, but I can't count on that. I have to assume that my opponent is playing smart and playing to win. If I didn't assume that, I wouldn't have to go through all this stuff in the first place. This is the informal description of what's called "minimax with alpha-beta pruning". It's called alpha-beta because traditionally, procedures which use this technique have a paramater called alpha which holds the biggest of the maximum value found and a parameter called beta which holds the smallest of the minimum values found. The usefulness of alpha-beta pruning is dependent upon the order in which you generate and search the possible moves. In some worst cases, there are orderings of the branches of the tree for which alpha-beta provides no help. (What if the two subtrees in the above example were explored in the reverse order?) In more common cases, alpha-beta pruning temporarily reduces the impact of exponentially increasing amounts of search, but it does not prevent that exponential increase. As the depth of the state space grows, the amount of work required still increases exponentially, but at a reduced rate. Let's take one more look at our real hexapawn game in this context: - W W W - - B B B / | \ / | \ / | \ / | \ / | \ / | \ / | \ - W W - W W - W W 0 B - - -10 W B - -10 W - B B - B B - B B B - / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ / | \ / \ / | \ - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - 0 1 1 -10 -1 -10 0 -1 Could alpha-beta pruning have saved us some work in deciding which move to make here? Sure. There are three moves we didn't have to look at. They are marked with an X below: X X X - - W - - W - W - - W - - W - - W W - - W - - W W - - B W - B - W W B W W W - - - B W W B W - W B - B B - B B - B B - B B - B B W - B B - B B - 0 1 1 -10 -1 -10 0 -1 If you figured out that these were the moves that alpha-beta pruning would have discarded without looking at them, and you can explain why, then you know everything you need to know about alpha-beta pruning. Copyright 1995 by Kurt Eiselt. All rights reserved. -- Kurt Eiselt Assistant Dean, Student Services College of Computing Georgia Institute of Technology Atlanta, GA 30332-0280 http://www.cc.gatech.edu/aimosaic/faculty/eiselt.html