I. Examples of state-space search in the real world
The state-space search we talked about two weeks ago is used in a lot
of ways by lots of folks. For example, a compiler has a component
called a parser which decomposes a high-level instruction into its
component parts. But these instructions can be ambiguous, so the
parser must make decisions about how various symbols (known as
"tokens" in compiler world) are being used. How that decision is
made depends on what the parser has already seen; in other words, the
next possible state of the parsing process depends on the history of
the previous states.
The parser reads the input from left to right, making "guesses" as it
goes. If the sequence of guesses leads to a structure for an
instruction that's not legal, the parser will backtrack and
systematically try new guesses, just like a depth-first search
algorithm. If no combination of guesses works for the parser, you'll
get a "syntax error" message. These things are sometimes called
"recursive descent parsers", and you could find out more about these
things in CS 2130 if you decide to take more CS courses.
The same sorts of ideas are used to get computers to understand English
and other natural languages. In fact, an entire company was founded on
this idea. A guy named Gary Hendrix at the University of Texas wrote a
PhD thesis on parsing English back in the late 60's or early 70's. He
later took some of those same ideas and built an interface to a simple
database system -- an interface that could accept data base queries in
English (or at least a subset of English). He called the whole thing
"Q&A", it ran on PC compatibles, and it sold off the shelf at computer
stores for about $300 a copy. This product was one of the first, if
not the first, offered by the company Hendrix co-founded, which is
called "Symantec" -- a company which many of you Mac or PC owners know
about, since it has swallowed up all sorts of other software vendors.
Hendrix is now a zillionaire, and the moral to this story is that
state-space search can make you rich.
As another example of state space search applied to the real world,
evolutionary biologists think of all of us (and I mean *all* of us) as
the bottom layers of nodes on a very big state space. Those of us who
don't have any children are the leaves on a very very big tree (well,
it's not exactly a tree, but you get the idea). Some of us will
generate new states (our kids) and others of us won't. Each state
presumably brings humanity slightly closer to some lifeform that is
perfectly adapted to the environment. (If only we could get the
environment to stop changing so we can catch up....)
Finally, as we demonstrated via our Calvin and Hobbes example,
state-space search is a nice little metaphor for how we lead our lives:
every decision we make is based on the chain of decisions leading up
to that point. As Calvin and Hobbes illustrated however, in life,
unlike in your computer, there's often no backtracking possible when
you make a bad decision.
II. A state-space search algorithm (depth-first)
Here's a very sketchy, high-level depth-first state-space search
algorithm that looks just like search algorithms that you've seen
already, except that it generates what is to be searched as it goes, as
opposed to searching some pre-existing data structure:
state-space (unexplored-states, goal-state, operators)
1. look at the first (leftmost) unexplored-state;
if there aren't anymore unexplored-states, then return failure
2. if that state is the goal-state, then return success
3. if that state isn't the goal-state, then generate all
possible new states from that state by applying the
set of operators to that state
4. call state-space with this new list of states passed
as the unexplored-states argument, and if that
succeeds then return success else...
5. call state-space with the old list of unexplored
states that remained after you stripped off the first
unexplored-state in step 1, and if that succeeds then
return success else...
6. return failure
In step 3, you'd like to check all the new states to see if you've explored
them before. You do that by keeping track of the sequence of states that
was generated in going from the very first state to where you are now, and
then comparing that list to the set of new states you just generated. If
there are any duplicates, be sure to eliminate them from the set of new
states.
III. A problem-solving example
Here's some Scheme code that performs a state-space search
to solve a very simple peg puzzle:
;; A version of something known as the peg puzzle consists of four pegs
;; (two blue and two red) and six holes. Each of the holes may hold one
;; peg. Initially the two blue pegs are in the left-most holes, and the
;; two red pegs are in the right-most holes.
;;
;; The object of this puzzle (and remember that it's a puzzle with one person
;; playing, not a game with two opponents), is to put the blue pegs in the
;; right-most holes and the red pegs in the left-most holes by moving one
;; peg at a time according to the following rules:
;;
;; A peg may be moved into an adjacent empty hole.
;; A peg may jump over a single peg of another color into an empty hole.
;; The red pegs may only move or jump to the left.
;; The blue pegs may only move or jump to the right.
;;
;; For this problem, we'll represent the current state of the puzzle as
;; a list, and the pegs by letters representing the color of the peg
;; B = blue, R = red, and _ = and empty hole
;;
;; So the initial state described above is (B B _ _ R R)
(define (peg-puzzle start goal)
(reverse (peg-puzzle-2 (list start) goal ())))
(define (peg-puzzle-2 unexplored-states goal path)
(cond ((no-more-states? unexplored-states) #f)
((found-goal? (get-first-unexplored-state unexplored-states) goal)
(cons goal path))
((found-cycle? (get-first-unexplored-state unexplored-states) path)
(peg-puzzle-2 (get-rest-of-unexplored-states unexplored-states)
goal
path))
(else (or (peg-puzzle-2 (generate-new-states
(get-first-unexplored-state
unexplored-states))
goal
(cons
(get-first-unexplored-state
unexplored-states)
path))
(peg-puzzle-2 (get-rest-of-unexplored-states
unexplored-states)
goal
path)))))
;; abstraction barrier: algorithm above, details below
(define (no-more-states? states)
(null? states))
(define (found-goal? state goal)
(equal? state goal))
(define (found-cycle? state path)
(member state path))
(define (get-first-unexplored-state states)
(car states))
(define (get-rest-of-unexplored-states states)
(cdr states))
;; Here's where this program departs from what you've already seen
;; (for example, the "hoppy" program you've seen previously).
;; Instead of just doing an assoc on association list to find all the
;; nodes in the graph that are directly connected to the node
;; currently being explored (that's what "hoppy" did, remember?),
;; we have to generate all the next states that can be reached
;; from the current state in one move. This is called move-generation.
;; And move-generation is really the only thing that makes
;; the peg-puzzle solution drastically different from the galactic
;; hopping solution.
;;
;; So what moves do we want to generate? How will that work?
;; First, let's take a look at the starting state. It looks like
;; this: (B B _ _ R R)
;; From any given state, there are at most four different types of
;; moves that could be made:
;; a blue piece could slide one spot to the right into an adjacent empty spot
;; a blue piece could jump to the right over a red piece into an empty spot
;; just past the red piece
;; a red piece could slide one spot to the left into an adjacent empty spot
;; a red piece could jump to the left over a blue piece into an empty spot
;; just past the blue piece
;;
;; So in order to generate all the moves possible from a given state,
;; we want to apply four different move-generators to that state, one
;; for each type of move. And then we'd like to combine the results of
;; those four move-generators into a single list to be used by the
;; top-level peg-puzzle function. Going back to our start state,
;; if we apply the first move-generator described, we should see
;; this state generated: (B _ B _ R R)
;; And if we apply the second move-generator, we shouldn't see any
;; states generated, because there are no jumps that a blue piece
;; can make.
;; Applying the third move-generator would give this state: (B B _ R _ R),
;; and applying the fourth move-generator would generate no new states,
;; because there are no jumps for a red piece either.
;;
;; It will be useful to keep in mind as we go that, at least for this
;; particular puzzle, the movement of red pieces is exactly the same
;; as the movement of blue pieces, except the red pieces travel in the
;; opposite direction. This suggests that we only need to worry about
;; solving the move-generation problem for the blue pieces. Then when
;; it comes time to deal with the red pieces, we can just reverse
;; the list that represents the current state, apply the move-generators
;; for the blue pieces, and then reverse the results.
;;
;; So let's take a look at move-generation for blue pieces. We'll start
;; with sliding blue pieces one space to the right to an empty space.
;; To do this, we'll want to look at the current state starting at the
;; left end and proceeding to the right, and every time we see a
;; portion of that list that looks like "B _", we'll want to replace
;; that pair of elements with "_ B", and return the resulting list as
;; a new possible state. At the heart of this, we'll want a function
;; that takes a list as an argument, a position number (starting with 0
;; for the first element in the list), and a segment of pieces and
;; spaces to replace what's already in the list. For example,
;; we might call this function "replace-segment", and it would work
;; like this:
;; > (replace-segment '(B B _ _ R R) 1 '(_ B))
;; (B _ B _ R R)
;;
;; Here's what that function looks like:
(define (replace-segment old-list pos segment)
(cond ((= pos 0) (append segment (all-but-first-n old-list (length segment))))
(else (cons (car old-list)
(replace-segment (cdr old-list) (- pos 1) segment)))))
(define (all-but-first-n old-list count)
(cond ((= count 0) old-list)
(else (all-but-first-n (cdr old-list) (- count 1)))))
;; Now we'd like to be able to use what we just wrote as we scoot along
;; a list representing a state, testing to see if we find "B _" at
;; each spot and replacing that segment with "_ B" if we do. And
;; we want to make sure that we don't inadvertently try to go past
;; the end of the list, right? So this new function might be called
;; like this and return a list of new states:
;; > (generate-new-blue-slides '(B _ B _ R R))
;; ((_ B B _ R R) (B _ _ B R R))
;;
;; Here's what that function looks like:
(define (generate-new-blue-slides curr-state)
(generate-new curr-state 0 '(B _) '(_ B)))
(define (generate-new curr-state pos old-segment new-segment)
(cond ((> (+ pos (length old-segment)) (length curr-state)) ())
((segment-equal? curr-state pos old-segment)
(cons (replace-segment curr-state pos new-segment)
(generate-new curr-state (+ pos 1)
old-segment new-segment)))
(else (generate-new curr-state (+ pos 1)
old-segment new-segment))))
;; To make the functions above work, we need to be able to compare
;; a specified segment of one list to another list, element by
;; element, to see if they're the same, before we do the replacement.
;; There's already a Scheme function, subseq, that would help with
;; this, but we haven't talked about it in class so I wrote my own,
;; more or less....
(define (segment-equal? curr-state pos old-segment)
(cond ((= pos 0) (segment-equal-2? curr-state old-segment))
(else (segment-equal? (cdr curr-state) (- pos 1) old-segment))))
(define (segment-equal-2? curr-state old-segment)
(cond ((null? old-segment) #t)
((equal? (car curr-state) (car old-segment))
(segment-equal-2? (cdr curr-state) (cdr old-segment)))
(else #f)))
;; We've done pretty much all the hard work now. To generate
;; the blue jumps that are possible, we just re-use most of
;; what we've already created but pass different arguments:
(define (generate-new-blue-jumps curr-state)
(generate-new curr-state 0 '(B R _) '(_ R B)))
;; Now to make the red slides and the red jumps, we just have
;; to reverse the list that is the current-state as we pass it
;; into our move-generators, and then we also have to reverse
;; the individual lists that are generated:
(define (generate-new-red-slides curr-state)
(reverse-each (generate-new (reverse curr-state) 0 '(R _) '(_ R))))
(define (generate-new-red-jumps curr-state)
(reverse-each (generate-new (reverse curr-state) 0 '(R B _) '(_ B R))))
(define (reverse-each list-of-lists)
(cond ((null? list-of-lists) ())
(else (cons (reverse (car list-of-lists))
(reverse-each (cdr list-of-lists))))))
;; And finally, we need to be able to tie all these move generators together
;; to create one big list of next possible moves:
(define (generate-new-states curr-state)
(append (generate-new-blue-slides curr-state)
(generate-new-blue-jumps curr-state)
(generate-new-red-slides curr-state)
(generate-new-red-jumps curr-state)))
;; That wasn't so hard, was it? Here's the whole thing in action:
;;
;;> (peg-puzzle '(B B _ _ R R) '(R R _ _ B B))
;;((b b _ _ r r)
;; (b _ b _ r r)
;; (_ b b _ r r)
;; (_ b _ b r r)
;; (_ b r b _ r)
;; (_ b r _ b r)
;; (_ b r r b _)
;; (_ b r r _ b)
;; (r b _ r _ b)
;; (r _ b r _ b)
;; (r _ _ r b b)
;; (r _ r _ b b)
;; (r r _ _ b b))
;;>
;;
;; This isn't necessarily the most elegant program that we could come
;; up with as a solution to this problem. There are some things
;; that are hard-coded that shouldn't be. Peg colors and allowable
;; direction of travel might be nice things to pass as parameters.
;; And it would be nice if the operators themselves were passed as
;; parameters too. You could do that with the use of some of the
;; things that Bryan talked about in the previous lecture.
;; And you could probably make this more concise with the use of map.
;; But I'll leave the optimization up to you. (Unless I decide to stay
;; up late some night and work on this some more...)
IV. Making your search smarter
Searches like what we've seen so far are, in a word, dumb. They don't
know which next state might be any better than any other next state.
These searches can be methodical (e.g., look at the first on the list)
or random (e.g., Calvin's decision: "Arbitrarily, I choose left.").
These searches settle for finding the goal state, but they don't care
about how many steps it takes to get from the initial state to the goal
state.
Usually, however, we don't have time to burn. We'd like to strive to
find the goal state in as few steps as we can. That is, we'd like to
try to find the "optimal path" from the initial state to the goal
state, and we can help ourselves out here if we can put a little more
"intelligence" into our search.
One time-honored way of doing this is to find a method to measure the
"goodness" of a state -- that is, to determine how close a given state
is to the goal state. If we could make that evaluation consistently
and correctly, then when we look at a list of states in trying to
decide which to use next to generate new states, we could pick the
state closest to the goal, instead of just picking the first one we see
or picking one at random.
Most of the time, though, such measurements of a state's goodness are
just estimates. If the estimate is wrong, you could spend a lot of
time and effort searching paths that will never get you to the goal, or
at least that will give you less optimal solutions. The better the
ability to estimate goodness, the better is the chance for optimality.
But unless the estimate is always right, there's no guarantee of
success. These measures of goodness are one example of something
called "heuristics": techniques that aid in the discovery of solutions
to problems most of the time, but don't guarantee that they'll lead to
solutions all of the time.
V. Heuristics and the 8-tile puzzle
Let's look again at the 8-tile puzzle we saw earlier. There we
described a dumb, exhaustive, brute-force, depth-first search for
finding the goal state. Could you do better? Probably yes. If you
could come up with a way to estimate how close any given arrangement of
tiles was to the goal, you could always choose to explore the state
that was nearest the goal. To do this, you'd have to figure out a way
to codify the metrics for this evaluation in such a way that a computer
could use them. One heuristic might be to just count the number of
tiles that are in the place they belong. So if your goal state looks
like this:
1 2 3
8 4
7 6 5
and your start state followed by the next possible states looks like this:
2 8 3
1 6 4
7 5
/|\
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
2 8 3 2 8 3 2 8 3
1 6 4 1 4 1 6 4
7 5 7 6 5 7 5
score score score
3 6 3
which of these next states is closer to the goal using our heuristic?
The middle state has six tiles in the right place (this assumes that
we're going to count the empty space as a tile), while the other two
states have only three tiles in the right place. So for our next step
in the search, we'd choose to generate all the states possible from
that middle state. Then we'd apply our evaluation heuristic again, and
so on. Of course, we could get more sophisticated with our heuristic
measures. For example, we could try to estimate how many moves it
would take to get all the tiles in their appropriate places instead of
just counting how many were already in the right place. That might
give us a better measure of goodness, or it might just cause us to
spend extra time computing the goodness without any real return on the
investment, or it might just completely mislead the search. We'd have
to play with it for awhile to see if it would help us.
VI. Game Search
For a single agent in a relatively non-hostile world, the search for
the path from some start state to some goal state is not especially
difficult, and in fact, it's sort of dull. But the world isn't always
peaceful---sometimes, there are other agents out there trying to keep
you from reaching your goal, and at the same time those other agents
are trying to achieve some goal of their own. We see this kind of
stuff everywhere: in the executive boardroom, on the athletic field,
sometimes even in the classroom. And when we try to model this kind of
competitive behavior on a computer, we have to keep in mind that while
our computerized "good guy" is going to try to move toward the goal in
as optimal a fashion as possible, the computerized "bad guy" is going
to do everything it can to keep us from getting there. Thus, the
question in a competitive or adversarial situation is no longer "what's
my optimal path to the goal?", but is instead "what's my path to the
goal when someone else is trying stop me?"
The fundamental change in the nature of that question results in a
fundamental change to the way we do state-space search in adversarial
situations, thus giving rise to something called "adversarial search".
And since we frequently use this kind of search when we build
intelligent game-playing programs, this kind of search is frequently
called "game search".
The principle of game search is to first generate the state space (or
"game tree") some number of levels deep, where each level corresponds
to one player's move (or, more accurately, the set of all moves that
the player could possibly make at that point). After generating the
state space for that number of levels, the nodes at the bottom level
are evaluated for goodness. (In the context of game playing, those
nodes are often called "boards", each one representing one possible
legal arrangement of game pieces on the game board.)
The estimate of the goodness of a board is a little bit different than
before, but not much. Since we have to worry about the opponent, we
set up our estimation function so that it returns a spectrum of values,
just like before, but now the two extremes are boards that are great
for us (i.e., we win) and boards that are great for our opponent (i.e.,
we lose). We apply our estimation function to those lowest level
boards, and propagate the numerical values upward to help us determine
which is the best move to make.
VII. The joy of hex
In order to explore the wild and wacky world of adversarial or game
search, we're going to have to introduce a game. It's a simple game
for two players, and it's called hexapawn, for reasons which will
become obvious.
The rules of hexapawn (at least in it's original form), are as
follows:
The game is played on a 3x3 board. Each player begins with three pawns
lined up on opposite sides of the board. There are three white pawns
and three black pawns, which gives us a grand total of six pawns, hence
the name hexapawn. White always moves first, just like in chess. The
players take turns moving their pawns. A pawn can move one square
forward to an empty square, or it can move one square diagonally ahead
(either to the left or right) to a square occupied by an opponent's
pawn, in which case the opponent's pawn is removed from the board. One
player wins when one of these three conditions is true: 1) one of that
player's pawns has reached the opposite end of the board, 2) the
opponent's pawns have all been removed from the board, or 3) it's the
opponent's turn to move but the opponent can't move any pawns.
It's not a very exciting game for human players, but it's reasonably
stimulating for humans who are required to write programs to get
computers to play this game, such as yourselves. (It's not entirely
clear how the computers feel about it.) Hexapawn also serves as a very
nice mechanism for demonstrating the principles of game search with
heuristics.
VIII. Hexapawn: catch the fever
Let's look at the beginning of a sample game of hexapawn and see how we
might get a computer to play the game. We'll let our opponent take the
side of the white pawns, and we'll play the black pawns. The initial
board configuration will be represented like this:
W W W
- - -
B B B
As we said, white always gets to move first. This gives white three
possible initial moves, which are represented in this way:
W W W
- - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W W - W W W -
W - - - W - - - W
B B B B B B B B B
But of course, white doesn't get to make all three moves. White has to
choose one and go with it. So let's say that white opts for that move on
the left. Our resulting state space then looks like this:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
Everything was fine up until now. Now it's our turn. What will we do?
Well, what we'd like to do is look at all of our possible next moves and
make the best one, right? Sure. So let's see what our options are on
this move:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
Can we tell from just this which possible next move is the best one?
Maybe. That one on the left looks sort of nice, since it leaves us with
one more pawn than our opponent. But we really can't tell just by looking
at these different boards which move is likely to lead to a win for us.
Maybe we could get a better idea of which of our three possible moves
is the best one by looking even further ahead to see what white might
do on the next turn:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
^ ^
| |
white white
wins wins
Well now, maybe we know a little more than before. In fact, we can see
two boards there that indicate victory for our opponent, and we could
probably make some reasonable attempts at estimating how close the other
boards might be to either a win or a loss for us. However, we could also
look ahead yet another move, and then another, and so on until we had
mapped out all the possibilities. The problem with doing this is that
it's going to cost us lots of computational resources. This may not be
a big deal when we're playing hexapawn with only three pawns on a side,
but it will be a big deal if we extended the game to eight pawns on a side,
for example. Or maybe instead of hexapawn variations, we're playing
something like chess. Now the computational expense will be far too
prohibitive, so we're going to have to settle on some arbitrary cutoff
for looking ahead in this or any game. Since I'm running out of room
to display all the possible boards at the same level, let's make life
easy for me and set our arbitrary cutoff for looking ahead at two
levels or two moves ahead.
IX. The static board evaluation function
Previously, we noted that two of those bottom-level boards were wins
for white. But what about those other boards? What do they indicate
for us? Will they lead to wins or losses for us? How can we estimate
that? How can we get a computer to estimate that?
Providing that estimate is the job of something called the "static
board evaluation function". A static board evaluation function takes
as input the current state of a game (i.e., the board) and returns a
value corresponding to the "goodness" of that current state or board.
By "goodness" we mean how close that board is to a victory for us---the
closer, the better. A simple static board evaluation function might
return, say, a positive number if the board is good for us, a negative
number if the board is not good for us (but is consequently good for
our opponent), and maybe a zero if the board is neither bad nor good
for either player.
How might we design such a function? Here's a weak first attempt at
one. Since we're playing on the black side of the board, we'll have
the function return a +10 if the board is such that black wins. And
we'll have it return a -10 if white wins. (If we were playing on the
white side of the board, we'd want it to return a +10 if white won, and
a -10 if black won.) Since we win if we can get one of our pawns
across the board to the other side, we should have the function take
that into account too. So if neither side has won, let's have our
function return the number of black pawns with clear paths in front of
them minus the number of white pawns with clear paths in front of them.
Oh, and since we win if our opponent's pawns are all removed from the
board, let's have the function incorporate that. We'll have the function
count the number of black pawns on the board, subtract the number of
white pawns, add that number to the previous number, and return that
result. There, that wasn't so bad, was it?
X. The minimax procedure
Now that we have a reasonably well-defined static board evaluation
function, how do we use it? Remember that the idea behind creating
this thing was to estimate the "goodness" of a board---in this case,
the function returns a positive number when the board is good for us,
and a negative number when the board is good for our opponent. Let's
apply the function we defined above to all the boards at the bottom
level of the hexapawn state space we generated way back up
there:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
The two boards that have been assigned a value of -10 are, of course, the
boards that represent victories for white. In the case of the leftmost of
those two boards, it's a victory for white because it's now our turn (i.e.,
black's turn) and we can't move any of our pawns. The rightmost of
these two boards is a victory for white because white has moved one of
its pawns all the way across the board.
But let's take a look at another board. The one at the very left, for
example, has been given a value of zero. Yet it's easy for us to see
that, since it's our turn, and we only have one black pawn that we can
move, if we just move that one black pawn forward one space, we'll have
blocked any possible move by white and we win the game. So why isn't
that board given a value of +10? Because in order to figure that out,
our static board evaluation function would have to look ahead one more
move. But a static board evaluation function is exactly that---static.
It doesn't look ahead. If we set a limit on the number of moves we
want to look ahead in order to play the game in a reasonable amount of
time, but then we have our board evaluation function look even further
ahead, we're going to eat up additional computing resources that we
were trying to save, and we're also going to end up writing the same
code twice. So there's absolutely no advantage to having the board
evaluation function look ahead an additional move or two or
three---instead, we should just readjust our original depth cutoff so
that it allows us to look more moves into the future.
Now let's go back and look at those bottom two levels in our hexapawn
state space:
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
What can we do with those numbers that have been assigned to the boards?
Those boards all represent possible results of a move by white. Those
numbers can be used to tell us which of those moves white is more likely
to make. For example, in the leftmost subtree, we might guess that white
is more likely to make the move that results in a board with value 0 than the
moves that result in boards with value 1, because a board with value 0
is better for white than a board with value 1, which favors us. That
assumes, of course, that we trust our evaluation function. Similarly,
in the middle and rightmost subtrees, white is going to prefer the
moves that result in a board with a value of -10 (a victory for white),
right? We can indicate those preferences by taking the minimum values
among those board values in each subtree and propagating them up one
level:
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
OK, now how can we use that information? We use it in almost exactly
the same way as we did before. We can figure out which move we should
make of the three that are available to us by finding the maximum of the
values that we just propagated upward. One of those values was a 0 and
the other two were -10. Of course, the 0 is better for us, so we'd choose
to make that move.
Let's go back and see what we've done here. First we started with some
arrangement of pawns on the board and the knowledge that it was our
turn. We generated all the moves we might make, and then we generated
all the moves that our opponent could make after we made our move. We
arbitrarily chose to look only two moves ahead, but we could have
looked further if we wanted to give up the computational resources to
do so. We then applied our static board evaluation function to the
bottom-most boards (i.e., the terminal leaves on the tree that is our
state space) and assigned a numeric value corresponding to "goodness"
to each of those boards. Those bottom-most boards are each the result
of a possible move by white. We assumed that white would always make
the best move it possibly could, so we propagated the minimum values up
from the leaves to the immediate parents. And then we assumed that we
would want to make the best possible move that we could, and we chose
that move by selecting the maximum of the values that had just been
propagated upwards.
Because we chose to look ahead only two moves, the first propagation
was of minimum values from the very bottom level, followed by a
propagation of maximum values upward from that level. If we had chosen
to look ahead three moves, we'd first propagate maximum values from the
bottom, then minimums, then maximums. If we were looking ahead four
moves, we'd start with minimums, then maximums, then minimums, then
maximums. And so on, and so on. The procedure that we just described
has a name, "minimax", and it's the heart of game-playing computer
programs.
The minimax procedure relies on two assumptions. First, there must be
an adequate and reasonably accurate board evaluation technique. It
doesn't have to be perfect, but it does have to work more often than
not. The second assumption is that the relative merit of a move
becomes more obvious as you search deeper and deeper into the state
space. Why? Because if that weren't so, there wouldn't be any value
in doing the search in the first place. But keep in mind that for any
given game, or at least any given implementation, one or the other (or
both) of these assumptions may not be true.
XI. Once more, with feeling
Let's continue the example for a bit. We have a hot hexapawn game
going, in which white made a move:
W W W
start - - -
B B B
/
/
/
/
/
/
- W W
white moves W - -
B B B
and then we (playing black) applied the minimax procedure to find our
best move from the board that our opponent had left for us:
W W W
start - - -
B B B
/
/
/
/
/
/
- W W
white moves W - -
B B B
/ | \
/ | \
/ | \
our / | \
best / | \
move / | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
So we make our move, and then white counters with a move:
W W W
start - - -
B B B
/
/
/
/
/
/
- W W
white moves W - -
B B B
/
/
/
/
/
/
/
we - W W
move B - -
B - B
|
|
|
|
white - - W
moves B W -
B - B
Note that white didn't make the move that we predicted would be the best
move for white. That happens a lot, but we don't care. Regardless of the
move that white made, we'd have to go through the whole minimax thing again
to decide our next best move. So let's go through it one more time
just to make sure we follow how this all works. We start with the
board that was left after white's last move:
- - W
B W -
B - B
Then we generate the state space that results from all the moves we could
make followed by all the moves that our opponent could make in response.
(Remember that we've arbitrarily set our search limit at two moves ahead...
we could have set that limit higher if we wanted to expend the resources.)
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - B B - B B - B W B
B - B - - B B - - B - -
/ \ / \ / \
/ \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
Then we use our static board evaluation function to determine the goodness
of the "terminal boards":
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - B B - B B - B W B
B - B - - B B - - B - -
/ \ / \ / \
10 / \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
2 4 1 3 -10 -10
Then we propagate the minimums up from the result of white's move (this is
the "minimizing level"):
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - 2 B B - 1 B B - -10 B W B
B - B - - B B - - B - -
/ \ / \ / \
10 / \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
2 4 1 3 -10 -10
And then we would propagate the maximum value up and select the best
move to make. In this case, that move would be the one on the left, with
the board value of 10, which indicates a win for us. Yippee!!
XII. Alpha-beta pruning
The next topic is something we talked about briefly in class...
we'll give it a little more exposure here in the notes. Here's
where we just left our hexapawn game:
- - W
B W -
B - B
/ ^ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
B - W - - W - - W - - W
- W - 2 B B - 1 B B - -10 B W B
B - B - - B B - - B - -
/ \ / \ / \
10 / \ / \ / \
/ \ / \ / \
/ \ / \ / \
- - - - - - - - - - - - - - W - - W
B W - B B W B W - B B W B - B B - B
- - B - - B B - - B - - W - - B W -
2 4 1 3 -10 -10
If you think about it for a minute, we didn't really have to go through
all that board generation and evaluation and propagation of values to
figure out what to do on that last move we made at the end of last
week's lecture. As soon as we generated that move on the left side of the
state space above and realized that it was a winning move (which is why we
didn't generate any moves beyond that), there was no reason to generate
the rest of the state space to the right. The first move that we looked
at was so good that it didn't matter what the other possibilities were.
We didn't recognize that because we were following our minimax procedure
blindly. But if we make our minimax procedure a little bit smarter, we
could reduce the cost of doing this search by "pruning" our state space
tree and getting rid of some of the board generation, evaluation, and
propagation, all of which eat up computational resources.
Let's take a look at a simple abstract example of how this might work.
Let's say we start with some board:
start
board
And from that starting board, I have two possible moves. But instead of
generating all my moves in the sort of breadth-first fashion that's
implied by the examples above, I'm going to fall back on my old depth-first
search technique and generate just one of my moves, and explore all of my
opponent's moves in response to my move before I go and look at my
other move:
start
board
/
/
/
/
/
my
move
Now, again following my depth-first approach, and remembering that I'm
still cutting off my search at two moves ahead, I look at one of my
opponent's moves and apply the static board evaluation function:
start
board
/
/
/
/
/
my
move
/
/
/
/
/
opp's
move
2
Let's say that my opponent has two possible moves after either of my
moves. We've just looked at one of the opponent's possible moves, now we'll
explore the other:
start
board
/
/
/
/
/
my
move
/ \
/ \
/ \
/ \
/ \
opp's opp's
move move
2 7
I then propagate the minimum value up from that level, and begin to
explore the possible outcomes of my other move:
start
board
/ \
/ \
/ \
/ \
/ \
2 my my
move move
/ \ /
/ \ /
/ \ /
/ \ /
opp's opp's opp's
move move move
2 7 1
The question now is "do I get any useful information from exploring my
opponent's remaining possible move?" And the answer is "no". Why? Let's
look at what could possibly happen here. If I generate that last remaining
board and apply the board evaluation function to it, the value of that board
is either going to be greater than or equal to 1, or it's going to be
less than 1. In the former case, the value that will be propagated up
from this level is 1, a value that I already knew. In the latter case,
the value less than 1 would be propagated up, and I didn't know about
that value already. But, and this is the important but, either of
those values will be less than 2, which is the minimum value that was
propagated up from the other side of the tree. So based on what I know
from only exploring three of my opponent's four possible moves, I can
determine that the fourth possible move will have no bearing on my
decision about what move I should make. I know I'm going to choose the
move to the left---the one where the worst my opponent can do to me is
leave me with a board with a value of 2. I know that I'm not going to
choose the move to the right, because my incomplete exploration of the
state space has already convinced me that the best I can do if I go
that direction is end up with a board that has a value of 1. And since
I know I'm not going to choose the move to the right now, there's no
reason to go through the effort of generating the remaining possible
move on that side. That saves me some computational resources. It may
not seem like much when I'm playing a piddly game of hexapawn, but
this same kind of savings, repeated over and over throughout a very
large game tree, will save me all kinds of resources in a game like
chess (which at the tournament level is a timed-game, so resource
management becomes very important). Oh, sure, maybe there's a
possibility that my opponent would do something stupid if I took that
move to the right and leave me with a +10 board and I'd win, but I
can't count on that. I have to assume that my opponent is playing
smart and playing to win. If I didn't assume that, I wouldn't have to
go through all this stuff in the first place.
This is the informal description of what's called "minimax with
alpha-beta pruning". It's called alpha-beta because traditionally,
procedures which use this technique have a paramater called alpha which
holds the biggest of the maximum value found and a parameter called
beta which holds the smallest of the minimum values found.
The usefulness of alpha-beta pruning is dependent upon the order in
which you generate and search the possible moves. In some worst cases,
there are orderings of the branches of the tree for which alpha-beta
provides no help. (What if the two subtrees in the above example were
explored in the reverse order?) In more common cases, alpha-beta
pruning temporarily reduces the impact of exponentially increasing
amounts of search, but it does not prevent that exponential increase.
As the depth of the state space grows, the amount of work required may
still increase exponentially, but at a reduced rate.
Let's take one more look at our real hexapawn game in this context:
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
Could alpha-beta pruning have saved us some work in deciding which move
to make here? Sure. There are three moves we didn't have to look at.
They are marked with an X below:
X X X
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
If you figured out that these were the moves that alpha-beta pruning would
have discarded without looking at them, and you can explain why, then you know
everything you need to know about alpha-beta pruning.
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: November 5, 2003