The minimax procedure
Now that we have a reasonably well-defined static board
evaluation function, how do we use it? Remember that the
idea behind creating this thing was to estimate the
"goodness" of a board---in this case, the function returns a
positive number when the board is good for us, and a negative
number when the board is good for our opponent. Let's apply
the function we defined above to all the boards at the bottom
level of the hexapawn state space we generated way back up
there:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
The two boards that have been assigned a value of -10 are, of
course, the boards that represent victories for white. In
the case of the leftmost of those two boards, it's a victory
for white because it's now our turn (i.e., black's turn) and
we can't move any of our pawns. The rightmost of these two
boards is a victory for white because white has moved one of
its pawns all the way across the board.
But let's take a look at another board. The one at the very
left, for example, has been given a value of zero. Yet it's
easy for us to see that, since it's our turn, and we only
have one black pawn that we can move, if we just move that
one black pawn forward one space, we'll have blocked any
possible move by white and we win the game. So why isn't
that board given a value of +10? Because in order to figure
that out, our static board evaluation function would have to
look ahead one more move. But a static board evaluation
function is exactly that---static. It doesn't look ahead.
If we set a limit on the number of moves we want to look
ahead in order to play the game in a reasonable amount of
time, but then we have our board evaluation function look
even further ahead, we're going to eat up additional
computing resources that we were trying to save, and we're
also going to end up writing the same code twice. So there's
absolutely no advantage to having the board evaluation
function look ahead an additional move or two or three---
instead, we should just readjust our original depth cutoff so
that it allows us to look more moves into the future.
Now let's go back and look at those bottom two levels in our
hexapawn state space:
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
What can we do with those numbers that have been assigned to
the boards? Those boards all represent possible results of a
move by white. Those numbers can be used to tell us which of
those moves white is more likely to make. For example, in
the leftmost subtree, we might guess that white is more
likely to make the move that results in a board with value 0
than the moves that result in boards with value 1, because a
board with value 0 is better for white than a board with
value 1, which favors us. That assumes, of course, that we
trust our evaluation function. Similarly, in the middle and
rightmost subtrees, white is going to prefer the moves that
result in a board with a value of -10 (a victory for white),
right? We can indicate those preferences by taking the
minimum values among those board values in each subtree and
propagating them up one level:
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
OK, now how can we use that information? We use it in almost
exactly the same way as we did before. We can figure out
which move we should make of the three that are available to
us by finding the maximum of the values that we just
propagated upward. One of those values was a 0 and the other
two were -10. Of course, the 0 is better for us, so we'd
choose to make that move.
Let's go back and see what we've done here. First we started
with some arrangement of pawns on the board and the knowledge
that it was our turn. We generated all the moves we might
make, and then we generated all the moves that our opponent
could make after we made our move. We arbitrarily chose to
look only two moves ahead, but we could have looked further
if we wanted to give up the computational resources to do so.
We then applied our static board evaluation function to the
bottom-most boards (i.e., the terminal leaves on the tree
that is our state space) and assigned a numeric value
corresponding to "goodness" to each of those boards. Those
bottom-most boards are each the result of a possible move by
white. We assumed that white would always make the best move
it possibly could, so we propagated the minimum values up
from the leaves to the immediate parents. And then we
assumed that we would want to make the best possible move
that we could, and we chose that move by selecting the
maximum of the values that had just been propagated upwards.
Because we chose to look ahead only two moves, the first
propagation was of minimum values from the very bottom level,
followed by a propagation of maximum values upward from that
level. If we had chosen to look ahead three moves, we'd
first propagate maximum values from the bottom, then
minimums, then maximums. If we were looking ahead four
moves, we'd start with minimums, then maximums, then
minimums, then maximums. And so on, and so on. The
procedure that we just described has a name, "minimax", and
it's the heart of game-playing computer programs.
The minimax procedure relies on two assumptions. First,
there must be an adequate and reasonably accurate board
evaluation technique. It doesn't have to be perfect, but it
does have to work more often than not. The second assumption
is that the relative merit of a move becomes more obvious as
you search deeper and deeper into the state space. Why?
Because if that weren't so, there wouldn't be any value in
doing the search in the first place. But keep in mind that
for any given game, or at least any given implementation, one
or the other (or both) of these assumptions may not be true.
Once more, with feeling
Let's continue the example for a bit. We have a hot hexapawn
game going, in which white made a move:
W W W
start - - -
B B B
/
/
/
/
/
/
- W W
white moves W - -
B B B
and then we (playing black) applied the minimax procedure to
find our best move from the board that our opponent had left
for us:
W W W
start - - -
B B B
/
/
/
/
/
/
- W W
white moves W - -
B B B
/ | \
/ | \
/ | \
our / | \
best / | \
move / | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
So we make our move, and then white counters with a move:
W W W
start - - -
B B B
/
/
/
/
/
/
- W W
white moves W - -
B B B
/
/
/
/
/
/
/
we - W W
move B - -
B - B
|
|
|
|
white - - W
moves B W
B - B
Note that white didn't make the move that we predicted would
be the best move for white. That happens a lot, but we don't
care. Regardless of the move that white made, we'd have to
go through the whole minimax thing again to decide our next
best move. So let's go through it one more time just to make
sure we follow how this all works. We start with the board
that was left after white's last move:
- - W
B W -
B - B
Then we generate the state space that results from all the
moves we could make followed by all the moves that our
opponent could make in response. (Remember that we've
arbitrarily set our search limit at two moves ahead...we
could have set that limit higher if we wanted to expend the
resources.)
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - B B - B B - B W B
B - B - - B B - - B - -
/ \ / \ / \
/ \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
Then we use our static board evaluation function to determine
the goodness of the "terminal boards":
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - B B - B B - B W B
B - B - - B B - - B - -
/ \ / \ / \
10 / \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
2 4 1 3 -10 -10
Then we propagate the minimums up from the result of white's
move (this is the "minimizing level"):
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - 2 B B - 1 B B - -10 B W B
B - B - - B B - - B - -
/ \ / \ / \
10 / \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
2 4 1 3 -10 -10
And then we would propagate the maximum value up and select
the best move to make. In this case, that move would be the
one on the left, with the board value of 10, which indicates
a win for us. Yippee!!
Alpha-beta pruning
If you think about it for a minute, we didn't really have to
go through all that board generation and evaluation and
propagation of values to figure out what to do on that last
move. As soon as we generated that move on the left side of
the state space above and realized that it was a winning move
(which is why we didn't generate any moves beyond that),
there was no reason to generate the rest of the state space
to the right. The first move that we looked at was so good
that it didn't matter what the other possibilities were. We
didn't recognize that because we were following our minimax
procedure blindly. But if we make our minimax procedure a
little bit smarter, we could reduce the cost of doing this
search by "pruning" our state space tree and getting rid of
some of the board generation, evaluation, and propagation,
all of which eat up computational resources.
Let's take a look at a simple abstract example of how this
might work. Let's say we start with some board:
start
board
And from that starting board, I have two possible moves. But
instead of generating all my moves in the sort of breadth-first
fashion that's implied by the examples above and in last week's
notes, I'm going to fall back on my old depth-first search
technique and generate just one of my moves, and explore all
of my opponent's moves in response to my move before I go and
look at my other move:
start
board
/
/
/
/
/
my
move
Now, again following my depth-first approach, and remembering
that I'm still cutting off my search at two moves ahead, I
look at one of my opponent's moves and apply the static board
evaluation function:
start
board
/
/
/
/
/
my
move
/
/
/
/
/
opp's
move
2
Let's say that my opponent has two possible moves after
either of my moves. We've just looked at one of the
opponent's possible moves, now well explore the other:
start
board
/
/
/
/
/
my
move
/ \
/ \
/ \
/ \
/ \
opp's opp's
move move
2 7
I then propagate the minimum value up from that level, and
begin to explore the possible outcomes of my other move:
start
board
/ \
/ \
/ \
/ \
/ \
2 my my
move move
/ \ /
/ \ /
/ \ /
/ \ /
opp's opp's opp's
move move move
2 7 1
The question now is "do I get any useful information from
exploring my opponent's remaining possible move?" And the
answer is "no". Why? Let's look at what could possibly
happen here. If I generate that last remaining board and
apply the board evaluation function to it, the value of that
board is either going to be greater than or equal to 1, or
it's going to be less than 1. In the former case, the value
that will be propagated up from this level is 1, a value that
I already knew. In the latter case, the value less than 1
would be propagated up, and I didn't know about that value
already. But, and this is the important but, either of those
values will be less than 2, which is the minimum value that
was propagated up from the other side of the tree. So based
on what I know from only exploring three of my opponent's
four possible moves, I can determine that the fourth possible
move will have no bearing on my decision about what move I
should make. I know I'm going to choose the move to the
left---the one where the worst my opponent can do to me is
leave me with a board with a value of 2. I know that I'm not
going to choose the move to the right, because my incomplete
exploration of the state space has already convinced me that
the best I can do if I go that direction is end up with a
board that has a value of 1. Oh, sure, maybe there's a
possibility that my opponent would do something stupid if I
took that move to the right and leave me with a +10 board and
I'd win, but I can't count on that. I have to assume that my
opponent is playing smart and playing to win. If I didn't
assume that, I wouldn't have to go through all this stuff in
the first place.
This is the informal description of what's called "minimax
with alpha-beta pruning". It's called alpha-beta because
traditionally, procedures which use this technique have a
paramater called alpha which holds the biggest of the maximum
value found and a parameter called beta which holds the
smallest of the minimum values found.
The usefulness of alpha-beta pruning is dependent upon the
order in which you generate and search the possible moves.
In some worst cases, there are orderings of the branches of
the tree for which alpha-beta provides no help. (What if the
two subtrees in the above example were explored in the
reverse order?) In more common cases, alpha-beta pruning
temporarily reduces the impact of exponentially increasing
amounts of search, but it does not prevent that exponential
increase. As the depth of the state space grows, the amount
of work required may still increase exponentially, but at a
reduced rate.
Let's take one more look at our real hexapawn game in this
context:
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
Could alpha-beta pruning have saved us some work in deciding
which move to make here? Sure. There are three moves we
didn't have to look at. They are marked with an X below:
X X X
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
If you figured out that these were the moves that alpha-beta
pruning would have discarded without looking at them, and you
can explain why, then you know everything you need to know
about alpha-beta pruning.
Copyright 1998 by Kurt Eiselt. All rights reserved.
Last revised: May 13, 1998