Last time we talked about networks of words in the dictionary and people
in a cough-drop company.
Networks on TV (not ABC, CBS, or NBC...or even Fox, UPN, or the WB)
Network abstractions have even been used by popular
publications to explain what's going on between the
characters in television shows. For example, at the height
of the popularity of the show "Twin Peaks" several years
ago, both People and Newsweek published very detailed network
representations of the relationships between the many
inhabitants of the town of Twin Peaks. I showed you all
reproductions of these diagrams, so I won't bother to repeat
them in ASCII here (whew!), and you could tell just by
looking at them that these networks are far from tree-like
(i.e., there's no obvious hierarchy, and there are most
definitely some cycles. Oh, by the way, here's some more
terminology...you'll also find structures like these called
"semantic networks" instead of "relational networks", depending
on how they're used, but you don't need to worry about that much
until you take CS 3361.). But the fundamental ideas about
organizing knowledge in terms of things and relationships
between things are still there, as are the fundamental ideas
about how to traverse these structures, which we'll be
discussing in the next lecture.
But in summary, let's revisit the original question, "Why are
we getting so excited about these trees and/or networks?" As
we've seen, the answer is that we can model so many diverse
things with them. In just this brief time, we've seen how we
can model the organization of dictionaries, human memory
(maybe), a small company, and a fictional town, all using the
same basic nodes-and-links representation scheme.
Furthermore, in so doing, we've shown that this common thread
runs through cognitive psychology, artificial intelligence,
object-oriented programming, and relational databases, just
to name a few areas of academic endeavor. (Not to mention
the World Wide Web itself, the infamous Six Degrees of Kevin
Bacon, and Newsweek's recent wild and wacky world of Kenneth
Starr.) See, there really is some method to the madness.
Now that we have all this new knowledge about representation
in trees, hierarchical structures, networks, and the like, we
need some means for exploring these knowledge structures to
get at the information we want at the time we want it. How
do we do this? The answer is a bunch of techniques which
collectively fall under the heading of "search". Search is a
concept which permeates computer science. We'll only touch
on a couple of kinds of search in this course, but they'll be
sufficient to demonstrate the basic difference between brute-
force, exhaustive, or "dumb" search and heuristic or
"intelligent" search.
Linear search
You probably already know how to do a linear search. You
probably did linear searches in previous programming courses.
For example, starting at the beginning of a file structure
and looking at record after record for a specific entry is a
linear search. (If you've ever seen my office, you know that
the only way I could find something in there is by linear
search: I start at one end of the desk and look at
everything until I find what I'm looking for.) Linear
searches take a long time -- O(n), that kind of time.
(Actually, assuming an even distribution of stuff in the
file, you're looking at 1/2 * O(n), but the constants are
more or less unimportant.)
We can impose a separate indexing scheme on our file
structure, so that we can cut down on some search time. For
example, we could apply a binary search mechanism to look for
an employee record in a file. If the employee's name starts
with a letter in the range A-M, we could start the search at
the beginning of the file, but if the name starts with the
letter N-Z, we would start the search at approximately the
midway point in the file. We could continue to divide the
big groups into smaller groups, until eventually the time to
find a single record is governed not by the behavior of the
linear search but by the behavior of the binary search.
There are other indexing mechanisms that we could use, such
as hashing functions, that would give us different kinds of
advantages.
Searching a hierarchical structure
As we discussed previously, we don't always store our stuff
in linear formats. We can also organize knowledge in hierarchies.
Consider, for example, the Flintstone Family Tree:
Chip Roxy
| (twins) |
___________
^
/ \
/ \
/ \
has-mom / \ has-dad
/ \
\/_ _\/
Pebbles Bam-Bam
/ \ has-dad / \
has-mom / \ has-mom / \ has-dad
/ \ / \
\/_ _\/ \/_ _\/
Wilma Fred Betty Barney
In structures like this, as before, we may want to search for
useful information. But structures like this, unlike linear
file structures, make it easier to search for the answers to
questions like "What's the relationship of Barney to Chip?"
or "Who is Chip's grandfather on his mother's side?"
Depth-first search
The simplest form of search in a hierarchical or network
structure is called "depth-first search". Here's an
algorithm for depth-first search on a binary tree, looking
for a specific node in the tree:
df-search
1. look at the root
2. if it's what you're looking for, then return success
3. if the root has no descendants, then return failure
4. call df-search on the subtree whose root is the leftmost
descendant and return success if that search is
successful
5. call df-search on the subtree whose root is the rightmost
descendant and return success if that search is
successful
This algorithm may look somewhat familiar, since it's just
a variant of the preorder tree traversal algorithm some of
you have seen in previous courses:
preorder
1. visit the root
2. call preorder on the left subtree
3. call preorder on the right subtree
The big differences between the preorder algorithm and the
depth-first search algorithm are these:
1. depth-first search stops before searching the whole tree,
if it finds what it's looking for; preorder traversal
always examines the entire tree
2. with depth-first search, searching the right subtree
occurs only if the search of the left subtree failed to
find what was being looked for; with preorder traversal,
the right subtree is always explored (this is sort of a
corollary to the first difference listed just above)
How do you implement this in what is quickly becoming your
favorite programming language? Keep reading.
Implementing depth-first search
Above we talked about the differences between depth-first
search of a binary tree and preorder traversal of that same tree.
These differences make implementation of depth-first search
more complicated than preorder traversal, but not drastically
so. Here's a simple depth-first search implementation for
the Flintstone Family Tree, using a representation format
for trees that we've used occasionally before (but this isn't
the only representation that we could have used). The tree
looks like this in LISP (and note that just to make things simpler,
we've eliminated one of the twins):
'(chip (pebbles (wilma nil nil) (fred nil nil))
(bam-bam (betty nil nil) (barney nil nil)))
And the LISP code itself looks like this:
(defun dfs (item tree)
(cond ((done? tree) nil)
((eql item (data tree)) item)
(T (or (dfs item (left-subtree tree))
(dfs item (right-subtree tree))))))
(defun done? (tree)
(null tree))
(defun data (tree)
(first tree))
(defun left-subtree (tree)
(second tree))
(defun right-subtree (tree)
(third tree))
I've abstracted away the details of accessing the list data
structure that represents the family tree, leaving only
a high-level algorithm description in the main function. In
fact, the only primitive LISP functions used to describe the
high-level algorithm are "defun", "cond", and "or".
The use of "or" in the "dfs" function is an easy way to
fulfill the requirement that the right subtree isn't searched
if what we're looking for is found in the left subtree. It's not
an especially obvious use of "or", which is typically used as
a Boolean predicate, not as program control mechanism. Also,
this use of "or" takes advantage of an implicit implementation
detail (i.e., that "or" evaluates its arguments left to right,
and stops as soon as it finds an argument which evaluates to a
non-nil value), which also is not necessarily a great thing
to do. Since the left-to-right evaluation of arguments is part
of the Common LISP specification for "or", we can count on
things happening the way we expect them to here, but it's not
inconceivable that some LISP system might implement an "or"
that evaluates arguments right-to-left, with results that might
confuse us a little bit (although in a purely functional world,
the results wouldn't be catastrophic). Furthermore, this assumes
that any given node has a fixed number of children; if you want to
cope with a variable number of children at any node, you might want
to code up a slightly different version of this anyway. For now,
we'll leave the "or" there, but feel free to do something better.
Next time we'll talk about how to "keep track of where we've been" as we
go along in our search.