Implementing depth-first search
Last week we talked about the differences between depth-first
search of a binary tree and preorder traversal of that same tree.
These differences make implementation of depth-first search
more complicated than preorder traversal, but not drastically
so. Here's a simple depth-first search implementation for
the Flintstone Family Tree, using the representation format
for trees given in problem 1 on your first midterm exam. The
tree looks like this in LISP:
'(rocky (pebbles (wilma nil nil) (fred nil nil))
(bam-bam (betty nil nil) (barney nil nil)))
And the LISP code itself looks like this:
(defun dfs (item tree)
(cond ((done? tree) nil)
((found-item? item (get-root tree)) item)
(T (or (dfs item (get-left-subtree tree))
(dfs item (get-right-subtree tree))))))
(defun done? (tree)
(null tree))
(defun found-item? (item tree)
(eql item tree))
(defun get-root (tree)
(first tree))
(defun get-left-subtree (tree)
(second tree))
(defun get-right-subtree (tree)
(third tree))
I've abstracted away the details of accessing the list data
structure that represents the family tree, leaving only
a high-level algorithm description in the main function. In
fact, the only primitive LISP functions used to describe the
high-level algorithm are "defun", "cond", and "or".
The use of "or" in the "dfs" function is an easy way to
fulfill the requirement that the right subtree isn't searched
if what we're looking for is found in the left subtree.
However, this may not be great programming style. It's not
an especially obvious use of "or", which is typically used as
a Boolean predicate, not as program control mechanism. Also,
this use of "or" takes advantage of an implementation detail
(i.e., that "or" evaluates its arguments left to right, and
stops as soon as it finds an argument which evaluates to a
non-nil value), which also is not necessarily a great thing
to do. Furthermore, this assumes that any given node has at
most two children; if you want to cope with any number of
children at any node, you might want to code up a slightly
different version of this anyway. For now, we'll leave the
"or" there, but feel free to do something better.
Getting past yes or no
Sadly, the search function described in the previous chunk of
notes above doesn't tell me much---just whether or not an item
I'm looking for is in the tree. I'd get more information
if I could get the search function to tell me how to
get from the root of the tree to the item I'm looking for,
assuming the item I'm looking for is in the tree.
That path from the root to the item would at least be
an approximation of the relationship between those two nodes
in the tree; in the case of the Flintstones, for example, the
path "Rocky -has-dad-> Bam-Bam -has-dad-> Barney" tells me
something about the relationship between Rocky and Barney.
How can I get my depth-first search procedure to
return this path, instead of just the item itself, when it
finds the item in the tree? It's pretty easy. All you do is
introduce an additional argument as a sort of variable to
store the path from the root to wherever the procedure is
looking in the tree. You get that additional argument by
adding a helping function, just like in many of those examples
of tail recursion. Then it's just a question of building up
the result as the procedure searches deeper in the tree:
(defun dfs (item tree)
(dfs-helper item tree nil))
(defun dfs-helper (item tree result)
(cond ((done? tree) nil)
((found-item? item (get-root tree))
(cons item result))
(T (or (dfs item
(get-left-subtree tree)
(cons (get-root tree) result))
(dfs item
(get-right-subtree tree)
(cons (get-root tree) result))))))
(defun done? (tree)
(null tree))
(defun found-item? (item tree)
(eql item tree))
(defun get-root (tree)
(first tree))
(defun get-left-subtree (tree)
(second tree))
(defun get-right-subtree (tree)
(third tree))
And note that because I've taken the time to do a great deal
of data abstraction, separating the functions that access the
LISP data structure from the higher-level algorithm, that all
I had to do was make a few changes to the top-level
procedure; the lower-level ones are untouched because we
didn't make any changes to the LISP data structure.
The state space
The metaphor of searching a tree is also a convenient one for
describing the state of a process (i.e., a program in
execution). The state of a process changes over time, and at
any given time the state of a process is a little slice of
its history.
At a very low level, the state of a process is described by
the values of the arguments being passed, the instruction
being executed, and if you're programming with side effects,
the bindings of variables to values. (Obviously, it's easier
to describe the state of a process if you don't have to worry
about side effects, as there's just that much less to keep
track of.) However, thinking about state at this low level
becomes very tedious very quickly. So, we might be better
off using a higher-level abstraction in thinking about the
state of a process. Consider, for example, a program to
solve the 8-tile puzzle. Instead of thinking in terms of
which instruction is being executed, the values bound to
arguments, and so on, we can look at the process in terms of
the state of the puzzle itself. Thus, the initial state of
the process would be the initial state of the puzzle. Say
the initial state looks like this:
2 8 3
1 6 4
7 5
We could move any of three tiles, the 7, the 6, or the 5, to
generate the three possible next states from this one:
2 8 3
1 6 4
7 5
/|\
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
2 8 3 2 8 3 2 8 3
1 6 4 1 4 1 6 4
7 5 7 6 5 7 5
If we then choose, say, the lower leftmost state of those
three new states, and generate the two possible next states
from that one, we get this:
2 8 3
1 6 4
7 5
/|\
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
2 8 3 2 8 3 2 8 3
1 6 4 1 4 1 6 4
7 5 7 6 5 7 5
/ \
/ \
/ \
/ \
/ \
2 8 3 2 8 3
6 4 1 6 4
1 7 5 7 5
Note that one of these new states is just a repeat of the
initial state. We wouldn't want to explore that direction
any further, because we'd just be doing work we've already
done.
Now, if we think of the movement of tiles as the significant
operations in this process, we can describe the history of
the process in terms of puzzle boards and the operations
necessary to get from one board to the next. And since the
nature of the operations in this case are such that only at
most four new boards can be generated from any given board,
we can safely say that the current behavior of the process
depends on its history--the process couldn't have been in the
current state without having just been in one of a very few
previous states.
If we keep applying operations (i.e., moving tiles) to the
leftmost board in the tree, we're going to get a depth-first
search. But we're not searching some pre-existing data
structure; instead we're searching something that's being
"built" as the program executes. This something is called a
"state space" (or a "problem space"), and our hypothetical 8-
tile program is performing a "state-space search" by
following a depth-first search algorithm.
A state-space is defined as the set of all possible states
generated by the repeated application of a finite set of
operations to some initial state. In performing a state-
space search, the intention is usually to find a sequence of
operations that gets one from the initial state to some goal
state. In the case of the 8-tile puzzle, that goal state
might be:
1 2 3
8 4
7 6 5
Why generate the state space at run-time, and not just have
it all built in advance? For some applications, that might
not be much of a problem. For example, in the 8-tile puzzle,
the number of different ways to arrange the tiles isn't
overwhelming. On the other hand, if you were working on a
program that could play a decent game of chess, and you
wanted to pre-build a data structure that was comprised of
all possible boards, you'd want to make sure that you set
aside a little disk space to store the approximately 10^120
(i.e., 1,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000) different
boards that are possible. Or maybe you'd be better off
writing your program to generate just those boards that were
relevant to the specific chess game it was playing at that
particular time, and not worry about the rest of them.
Examples of state-space search in the real world
The state-space search is used in a lot of ways by lots of folks.
For example, a compiler has a component called a parser which
decomposes a high-level instruction into its component parts.
But these instructions can be ambiguous, so the parser must
make decisions about how various symbols (known as "tokens"
in compiler world) are being used. How that decision is made
depends on what the parser has already seen; in other words,
the next possible state of the parsing process depends on the
history of the previous states.
The parser reads the input from left to right, making
"guesses" as it goes. If the sequence of guesses leads to a
structure for an instruction that's not legal, the parser
will backtrack and systematically try new guesses, just like
a depth-first search algorithm. If no combination of guesses
works for the parser, you'll get a "syntax error" message.
These things are sometimes called "recursive descent
parsers", and you'll get to study these in your compiler
course someday.
The same sorts of ideas are used to get computers to
understand English and other natural languages. In fact, an
entire company was founded on this idea. A guy named Gary
Hendrix at the University of Texas wrote a PhD thesis on
parsing English back in the late 60's or early 70's. He
later took some of those same ideas and build an interface to
a simple database system -- an interface that could accept
data base queries in English (or at least a subset of
English). He called the whole thing "Q&A", it ran on PC
compatibles, and it sold off the shelf at computer stores
for about $300 a copy. This product was one of the first, if
not the first, offered by the company Hendrix co-founded,
which is called "Symantec" -- a company which most of you Mac
or PC owners know about, since it has swallowed up all sorts
of other software vendors. Hendrix is now a zillionaire, and
the moral to this story is that state-space search can make
you rich.
As we mentioned in class, evolutionary biologists think of
all of us (and I mean *all* of us) as the bottom layers of
nodes on a very big state space. Those of us who don't have
any children are the leaves on a very very big tree (well,
it's not exactly a tree, but you get the idea). Some of us
will generate new states (our kids) and others of us won't.
Each state presumably brings humanity slightly closer to some
lifeform that is perfectly adapted to the environment. (If
only we could get the environment to stop changing....)
Finally, as we demonstrated via our Calvin and Hobbes
example, state-space search is a nice little metaphor for how
we lead our lives: every decision we make is based on the
chain of decisions leading up to that point. As Calvin and
Hobbes illustrated however, in life, unlike in your computer,
there's often no backtracking possible when you make a bad decision.
A state-space search algorithm (depth-first)
Here's a very sketchy, high-level depth-first state-space
search algorithm that looks just like search algorithms that
you've seen already, except that it generates what is to be
searched as it goes, as opposed to searching some pre-existing
data structure:
state-space (initial-states, goal-state, operators)
1. look at the first (leftmost) initial-state
2. if that state is the goal-state, then return success
3. if that state isn't the goal-state, then generate all
possible new states from that state by applying the
set of operators to that state
4. if there aren't any new states generated by applying
those operators, then return failure
5. call state-space with this new list of states passed as
the initial-states argument, and if that succeeds then
return success else...
6. call state-space with the old list of initial states
that remained after you stripped off the first
initial-state in step 1, and if that succeeds then
return success else...
7. return failure
In step 3, you'd like to check all the new states to see if
you've explored them before. You do that by keeping track of
the sequence of states that was generated in going from the
very first state to where you are now, and then comparing
that list to the set of new states you just generated. If
there are any duplicates, be sure to eliminate them from the
set of new states.
If you can implement something like this now, the next homework
assignment will be a piece of cake. Sort of.
Copyright 1997 by Kurt Eiselt. All rights reserved.
Last revised: February 11, 1997