CS 2360 - February 11, 1997

Lecture 11 -- State Space Search


Implementing depth-first search

Last week we talked about the differences between depth-first
search of a binary tree and preorder traversal of that same tree.
These differences make implementation of depth-first search 
more complicated than preorder traversal, but not drastically 
so.  Here's a simple depth-first search implementation for 
the Flintstone Family Tree, using the representation format 
for trees given in problem 1 on your first midterm exam.  The 
tree looks like this in LISP:

  '(rocky (pebbles (wilma nil nil) (fred nil nil))
          (bam-bam (betty nil nil) (barney nil nil)))

And the LISP code itself looks like this:

(defun dfs (item tree)
  (cond ((done? tree) nil)
        ((found-item? item (get-root tree)) item)
        (T (or (dfs item (get-left-subtree tree))
               (dfs item (get-right-subtree tree))))))

(defun done? (tree)
  (null tree))

(defun found-item? (item tree)
  (eql item tree))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

I've abstracted away the details of accessing the list data
structure that represents the family tree, leaving only 
a high-level algorithm description in the main function.  In
fact, the only primitive LISP functions used to describe the
high-level algorithm are "defun", "cond", and "or".

The use of "or" in the "dfs" function is an easy way to 
fulfill the requirement that the right subtree isn't searched 
if what we're looking for is found in the left subtree.  
However, this may not be great programming style.  It's not 
an especially obvious use of "or", which is typically used as 
a Boolean predicate, not as program control mechanism.  Also, 
this use of "or" takes advantage of an implementation detail 
(i.e., that "or" evaluates its arguments left to right, and 
stops as soon as it finds an argument which evaluates to a 
non-nil value), which also is not necessarily a great thing 
to do.  Furthermore, this assumes that any given node has at 
most two children; if you want to cope with any number of 
children at any node, you might want to code up a slightly 
different version of this anyway.  For now, we'll leave the 
"or" there, but feel free to do something better.


Getting past yes or no

Sadly, the search function described in the previous chunk of
notes above doesn't tell me much---just whether or not an item 
I'm looking for is in the tree.  I'd get more information 
if I could get the search function to tell me how to 
get from the root of the tree to the item I'm looking for, 
assuming the item I'm looking for is in the tree.  
That path from the root to the item would at least be 
an approximation of the relationship between those two nodes 
in the tree; in the case of the Flintstones, for example, the 
path "Rocky -has-dad-> Bam-Bam -has-dad-> Barney" tells me 
something about the relationship between Rocky and Barney.  
How can I get my depth-first search procedure to 
return this path, instead of just the item itself, when it 
finds the item in the tree?  It's pretty easy.  All you do is 
introduce an additional argument as a sort of variable to 
store the path from the root to wherever the procedure is 
looking in the tree.  You get that additional argument by 
adding a helping function, just like in many of those examples
of tail recursion.  Then it's just a question of building up 
the result as the procedure searches deeper in the tree:

(defun dfs (item tree)
  (dfs-helper item tree nil))

(defun dfs-helper (item tree result)
  (cond ((done? tree) nil)
        ((found-item? item (get-root tree)) 
         (cons item result))
        (T (or (dfs item 
                    (get-left-subtree tree)
                    (cons (get-root tree) result))
               (dfs item 
                    (get-right-subtree tree)
                    (cons (get-root tree) result))))))

(defun done? (tree)
  (null tree))

(defun found-item? (item tree)
  (eql item tree))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

And note that because I've taken the time to do a great deal 
of data abstraction, separating the functions that access the 
LISP data structure from the higher-level algorithm, that all 
I had to do was make a few changes to the top-level 
procedure; the lower-level ones are untouched because we 
didn't make any changes to the LISP data structure.


The state space

The metaphor of searching a tree is also a convenient one for 
describing the state of a process (i.e., a program in 
execution).  The state of a process changes over time, and at 
any given time the state of a process is a little slice of 
its history.

At a very low level, the state of a process is described by 
the values of the arguments being passed, the instruction 
being executed, and if you're programming with side effects, 
the bindings of variables to values.  (Obviously, it's easier 
to describe the state of a process if you don't have to worry 
about side effects, as there's just that much less to keep 
track of.)  However, thinking about state at this low level 
becomes very tedious very quickly.  So, we might be better 
off using a higher-level abstraction in thinking about the 
state of a process.  Consider, for example, a program to 
solve the 8-tile puzzle.  Instead of thinking in terms of 
which instruction is being executed, the values bound to 
arguments, and so on, we can look at the process in terms of 
the state of the puzzle itself.  Thus, the initial state of 
the process would be the initial state of the puzzle.  Say 
the initial state looks like this:

                  2 8 3
                  1 6 4
                  7   5

We could move any of three tiles, the 7, the 6, or the 5, to 
generate the three possible next states from this one:

                  2 8 3
                  1 6 4
                  7   5
                   /|\
                  / | \
                 /  |  \
                /   |   \
               /    |    \
              /     |     \
             /      |      \
            /       |       \
          2 8 3   2 8 3   2 8 3
          1 6 4   1   4   1 6 4
            7 5   7 6 5   7 5

If we then choose, say, the lower leftmost state of those 
three new states, and generate the two possible next states 
from that one, we get this:

                  2 8 3
                  1 6 4
                  7   5
                   /|\
                  / | \
                 /  |  \
                /   |   \
               /    |    \
              /     |     \
             /      |      \
            /       |       \
          2 8 3   2 8 3   2 8 3
          1 6 4   1   4   1 6 4
            7 5   7 6 5   7 5
           / \
          /   \
         /     \
        /       \
       /         \
     2 8 3     2 8 3
       6 4     1 6 4
     1 7 5     7   5

Note that one of these new states is just a repeat of the 
initial state.  We wouldn't want to explore that direction 
any further, because we'd just be doing work we've already 
done.

Now, if we think of the movement of tiles as the significant 
operations in this process, we can describe the history of 
the process in terms of puzzle boards and the operations 
necessary to get from one board to the next.  And since the 
nature of the operations in this case are such that only at 
most four new boards can be generated from any given board, 
we can safely say that the current behavior of the process 
depends on its history--the process couldn't have been in the 
current state without having just been in one of a very few 
previous states.

If we keep applying operations (i.e., moving tiles) to the 
leftmost board in the tree, we're going to get a depth-first 
search.  But we're not searching some pre-existing data 
structure; instead we're searching something that's being 
"built" as the program executes.  This something is called a 
"state space" (or a "problem space"), and our hypothetical 8-
tile program is performing a "state-space search" by 
following a depth-first search algorithm.

A state-space is defined as the set of all possible states 
generated by the repeated application of a finite set of 
operations to some initial state.  In performing a state-
space search, the intention is usually to find a sequence of 
operations that gets one from the initial state to some goal 
state.  In the case of the 8-tile puzzle, that goal state 
might be:

                  1 2 3
                  8   4
                  7 6 5

Why generate the state space at run-time, and not just have 
it all built in advance?  For some applications, that might 
not be much of a problem.  For example, in the 8-tile puzzle, 
the number of different ways to arrange the tiles isn't 
overwhelming.  On the other hand, if you were working on a 
program that could play a decent game of chess, and you 
wanted to pre-build a data structure that was comprised of 
all possible boards, you'd want to make sure that you set 
aside a little disk space to store the approximately 10^120  
(i.e., 1,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000) different 
boards that are possible.  Or maybe you'd be better off 
writing your program to generate just those boards that were 
relevant to the specific chess game it was playing at that 
particular time, and not worry about the rest of them.


Examples of state-space search in the real world

The state-space search is used in a lot of ways by lots of folks.
For example, a compiler has a component called a parser which 
decomposes a high-level instruction into its component parts.  
But these instructions can be ambiguous, so the parser must 
make decisions about how various symbols (known as "tokens" 
in compiler world) are being used.  How that decision is made 
depends on what the parser has already seen; in other words, 
the next possible state of the parsing process depends on the 
history of the previous states.

The parser reads the input from left to right, making 
"guesses" as it goes.  If the sequence of guesses leads to a 
structure for an instruction that's not legal, the parser 
will backtrack and systematically try new guesses, just like 
a depth-first search algorithm.  If no combination of guesses 
works for the parser, you'll get a "syntax error" message.  
These things are sometimes called "recursive descent 
parsers", and you'll get to study these in your compiler 
course someday.

The same sorts of ideas are used to get computers to 
understand English and other natural languages.  In fact, an 
entire company was founded on this idea.  A guy named Gary 
Hendrix at the University of Texas wrote a PhD thesis on 
parsing English back in the late 60's or early 70's.  He 
later took some of those same ideas and build an interface to 
a simple database system -- an interface that could accept 
data base queries in English (or at least a subset of 
English).  He called the whole thing "Q&A", it ran on PC 
compatibles, and it sold off the shelf at computer stores 
for about $300 a copy.  This product was one of the first, if 
not the first, offered by the company Hendrix co-founded, 
which is called "Symantec" -- a company which most of you Mac 
or PC owners know about, since it has swallowed up all sorts 
of other software vendors.  Hendrix is now a zillionaire, and 
the moral to this story is that state-space search can make 
you rich.

As we mentioned in class, evolutionary biologists think of 
all of us (and I mean *all* of us) as the bottom layers of 
nodes on a very big state space.  Those of us who don't have 
any children are the leaves on a very very big tree (well, 
it's not exactly a tree, but you get the idea).  Some of us 
will generate new states (our kids) and others of us won't.  
Each state presumably brings humanity slightly closer to some 
lifeform that is perfectly adapted to the environment.  (If 
only we could get the environment to stop changing....)

Finally, as we demonstrated via our Calvin and Hobbes 
example, state-space search is a nice little metaphor for how 
we lead our lives:  every decision we make is based on the 
chain of decisions leading up to that point.  As Calvin and
Hobbes illustrated however, in life, unlike in your computer, 
there's often no backtracking possible when you make a bad decision.

A state-space search algorithm (depth-first)

Here's a very sketchy, high-level depth-first state-space 
search algorithm that looks just like search algorithms that 
you've seen already, except that it generates what is to be 
searched as it goes, as opposed to searching some pre-existing 
data structure:

state-space (initial-states, goal-state, operators)

  1.  look at the first (leftmost) initial-state
  2.  if that state is the goal-state, then return success
  3.  if that state isn't the goal-state, then generate all
      possible new states from that state by applying the
      set of operators to that state
  4.  if there aren't any new states generated by applying
      those operators, then return failure
  5.  call state-space with this new list of states passed as
      the initial-states argument, and if that succeeds then
      return success else...
  6.  call state-space with the old list of initial states
      that remained after you stripped off the first
      initial-state in step 1, and if that succeeds then
      return success else...
  7.  return failure

In step 3, you'd like to check all the new states to see if 
you've explored them before.  You do that by keeping track of 
the sequence of states that was generated in going from the 
very first state to where you are now, and then comparing 
that list to the set of new states you just generated.  If 
there are any duplicates, be sure to eliminate them from the 
set of new states.

If you can implement something like this now, the next homework
assignment will be a piece of cake.  Sort of.



Copyright 1997 by Kurt Eiselt.  All rights reserved.

Last revised: February 11, 1997