CS 2360 - February 10, 1998

Lecture 11 -- Search


Now that we have all this new knowledge about representation 
in trees, hierarchical structures, networks, and the like, we 
need some means for exploring these knowledge structures to 
get at the information we want at the time we want it.  How 
do we do this?  The answer is a bunch of techniques which 
collectively fall under the heading of "search".  Search is a 
concept which permeates computer science.  We'll only touch 
on a couple of kinds of search in this course, but they'll be 
sufficient to demonstrate the basic difference between brute-
force, exhaustive, or "dumb" search and heuristic or 
"intelligent" search.

Linear search

You probably already know how to do a linear search.  You 
probably did linear searches in previous programming courses.  
For example, starting at the beginning of a file structure 
and looking at record after record for a specific entry is a 
linear search.  (If you've ever seen my office, you know that 
the only way I could find something in there is by linear 
search:  I start at one end of the desk and look at 
everything until I find what I'm looking for.)  Linear 
searches take a long time -- O(n), that kind of time.  
(Actually, assuming an even distribution of stuff in the 
file, you're looking at 1/2 * O(n), but the constants are 
more or less unimportant.)

We can impose a separate indexing scheme on our file 
structure, so that we can cut down on some search time.  For 
example, we could apply a binary search mechanism to look for 
an employee record in a file.  If the employee's name starts 
with a letter in the range A-M, we could start the search at 
the beginning of the file, but if the name starts with the 
letter N-Z, we would start the search at approximately the 
midway point in the file.  We could continue to divide the 
big groups into smaller groups, until eventually the time to 
find a single record is governed not by the behavior of the 
linear search but by the behavior of the binary search.  
There are other indexing mechanisms that we could use, such 
as hashing functions, that would give us different kinds of 
advantages.


Searching a hierarchical structure

As we discussed previously, we don't always store our stuff 
in linear formats.  We can also organize knowledge in hierarchies.  
Consider, for example, the Flintstone Family Tree:


                    Chip       Roxy
                      | (twins) |
                      ___________
                           ^
                          / \
                         /   \
                        /     \
               has-mom /       \ has-dad
                      /         \
                    \/_         _\/
                 Pebbles       Bam-Bam
                 / \ has-dad       / \
        has-mom /   \     has-mom /   \ has-dad
               /     \           /     \
             \/_     _\/       \/_     _\/
            Wilma    Fred     Betty   Barney

In structures like this, as before, we may want to search for 
useful information.  But structures like this, unlike linear 
file structures, make it easier to search for the answers to 
questions like "What's the relationship of Barney to Chip?" 
or "Who is Chip's grandfather on his mother's side?"

Depth-first search

The simplest form of search in a hierarchical or network 
structure is called "depth-first search".  Here's an 
algorithm for depth-first search on a binary tree, looking 
for a specific node in the tree:

df-search

1.  look at the root
2.  if it's what you're looking for, then return success
3.  if the root has no descendants, then return failure
4.  call df-search on the subtree whose root is the leftmost
    descendant and return success if that search is 
    successful
5.  call df-search on the subtree whose root is the rightmost
    descendant and return success if that search is 
    successful

This algorithm may look somewhat familiar, since it's just 
a variant of the preorder tree traversal algorithm some of
you have seen in previous courses:

preorder

1.  visit the root
2.  call preorder on the left subtree
3.  call preorder on the right subtree

The big differences between the preorder algorithm and the 
depth-first search algorithm are these:

1.  depth-first search stops before searching the whole tree,
    if it finds what it's looking for; preorder traversal
    always examines the entire tree

2.  with depth-first search, searching the right subtree
    occurs only if the search of the left subtree failed to
    find what was being looked for; with preorder traversal,
    the right subtree is always explored (this is sort of a 
    corollary to the first difference listed just above)

How do you implement this in what is quickly becoming your
favorite programming language?  Keep reading.


Implementing depth-first search

Above we talked about the differences between depth-first
search of a binary tree and preorder traversal of that same tree.
These differences make implementation of depth-first search 
more complicated than preorder traversal, but not drastically 
so.  Here's a simple depth-first search implementation for 
the Flintstone Family Tree, using a representation format 
for trees that we've used occasionally before (but this isn't
the only representation that we could have used).  The tree 
looks like this in LISP (and note that just to make things simpler,
we've eliminated one of the twins):

  '(chip (pebbles (wilma nil nil) (fred nil nil))
         (bam-bam (betty nil nil) (barney nil nil)))

And the LISP code itself looks like this:

(defun dfs (item tree)
  (cond ((done? tree) nil)
        ((found-item? item (get-root tree)) item)
        (T (or (dfs item (get-left-subtree tree))
               (dfs item (get-right-subtree tree))))))

(defun done? (tree)
  (null tree))

(defun found-item? (item tree)
  (eql item tree))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

I've abstracted away the details of accessing the list data
structure that represents the family tree, leaving only 
a high-level algorithm description in the main function.  In
fact, the only primitive LISP functions used to describe the
high-level algorithm are "defun", "cond", and "or".

The use of "or" in the "dfs" function is an easy way to 
fulfill the requirement that the right subtree isn't searched 
if what we're looking for is found in the left subtree.  It's not 
an especially obvious use of "or", which is typically used as 
a Boolean predicate, not as program control mechanism.  Also, 
this use of "or" takes advantage of an implicit implementation 
detail (i.e., that "or" evaluates its arguments left to right, 
and stops as soon as it finds an argument which evaluates to a 
non-nil value), which also is not necessarily a great thing 
to do.  Since the left-to-right evaluation of arguments is part
of the Common LISP specification for "or", we can count on
things happening the way we expect them to here, but it's not
inconceivable that some LISP system might implement an "or"
that evaluates arguments right-to-left, with results that might
confuse us a little bit (although in a purely functional world,
the results wouldn't be catastrophic).  Furthermore, this assumes 
that any given node has a fixed number of children; if you want to 
cope with a variable number of children at any node, you might want 
to code up a slightly different version of this anyway.  For now, 
we'll leave the "or" there, but feel free to do something better.


Getting past yes or no

Sadly, the search function described in the previous chunk of
notes above doesn't tell me much---just whether or not an item 
I'm looking for is in the tree.  I'd get more information 
if I could get the search function to tell me how to 
get from the root of the tree to the item I'm looking for, 
assuming the item I'm looking for is in the tree.  
That path from the root to the item would at least be 
an approximation of the relationship between those two nodes 
in the tree; in the case of the Flintstones, for example, the 
path "Chip -has-dad-> Bam-Bam -has-dad-> Barney" tells me 
something about the relationship between Chip and Barney.  
How can I get my depth-first search procedure to 
return this path, instead of just the item itself, when it 
finds the item in the tree?  It's pretty easy.  All you do is 
introduce an additional argument as a sort of variable to 
store the path from the root to wherever the procedure is 
looking in the tree.  You get that additional argument by 
adding a helping function, just like in many of those examples
of tail recursion.  (But note that the addition of the helper
function does not make this depth-first search tail recursive.)
Then it's just a question of building up the result as the 
procedure searches deeper in the tree:

(defun dfs (item tree)
  (dfs-helper item tree nil))

(defun dfs-helper (item tree result)
  (cond ((done? tree) nil)
        ((found-item? item (get-root tree)) 
         (cons item result))
        (T (or (dfs-helper item 
                           (get-left-subtree tree)
                           (cons (get-root tree) result))
               (dfs-helper item 
                           (get-right-subtree tree)
                           (cons (get-root tree) result))))))

(defun done? (tree)
  (null tree))

(defun found-item? (item tree)
  (eql item tree))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

And note that because I've taken the time to do a great deal 
of data abstraction, separating the functions that access the 
LISP data structure from the higher-level algorithm, that all 
I had to do was make a few changes to the top-level 
procedure; the lower-level ones are untouched because we 
didn't make any changes to the LISP data structure.



Copyright 1998 by Kurt Eiselt.  All rights reserved.

Last revised: February 16, 1998