CS 2360 - February 6, 1997

Lecture 10 -- Traversing Hierarchical Data Structures


Networks and relational databases

Everything we've shown you so far has been purely tree-like 
in form, but as we've said, that's clearly not necessarily 
going to be true.  In fact, it's much more likely that the 
organization in these structures will be much more 
convoluted.  Consider some of the relationships which may 
exist in a small company that makes cough drops:


          Smith     options
         Brothers ---------- pay plans
            |\                /     \
            | \              /       \
            |  \      salaried     hourly
            |   \      /           /
       dept.|    ---- /           /
            |        \           /
            |  pay  / ----      /pay
            |  plan/      \    /plan
            |     /   dept.\  /
            |    /          \/
       engineering      shipping
          /   \           /   \
         /     \         /     \
        /       \       /       \
     Arnie    Brian   Chuck    David
     Smith    Smith   Smith    Smith

Ugh.  Anyway, welcome to the exciting world of relational 
databases.  Just like object-oriented programming, relational 
database work is something that evolved from artificial 
intelligence ideas about how to organize knowledge (although 
you'll never get a relational database person to admit this), 
which in turn evolved from ideas in cognitive psychology.

Networks on TV (not ABC, CBS, or NBC...or even Fox)

Network abstractions have even been used by popular 
publications to explain what's going on between the 
characters in television shows.  For example, at the height 
of the popularity of the show "Twin Peaks" a couple of years 
ago, both People and Newsweek published very detailed network 
representations of the relationships between the many 
inhabitants of the town of Twin Peaks.  I showed you all 
reproductions of these diagrams (despite my failed attemps
to maim the overhead projector), so I won't bother to repeat
them in ASCII here (whew!), and you could tell just by 
looking at them that these networks are far from tree-like 
(i.e., there's no obvious hierarchy, and there are most 
definitely some cycles.  Oh, by the way, here's some more 
terminology...you'll also find structures like these called 
"semantic networks" instead of "relational networks", depending
on how they're used, but you don't need to worry about that much 
until you take CS 3361.).  But the fundamental ideas about 
organizing knowledge in terms of things and relationships 
between things are still there, as are the fundamental ideas 
about how to traverse these structures, which we'll be 
discussing right after the next paragraph.

But in summary, let's revisit the original question, "Why are 
we getting so excited about these trees and/or networks?"  As 
we've seen, the answer is that we can model so many diverse 
things with them.  In just this brief time, we've seen how we 
can model the organization of dictionaries, human memory 
(maybe), a small company, and a fictional town, all using the 
same basic nodes-and-links representation scheme.  
Furthermore, in so doing, we've shown that this common thread 
runs through cognitive psychology, artificial intelligence, 
object-oriented programming, and relational databases, just 
to name a few areas of academic endeavor.  See, there really 
is some method to the madness.  Trust me.


Search

Now that we have all this new knowledge about representation 
in trees, hierarchical structures, networks, and the like, we 
need some means for exploring these knowledge structures to 
get at the information we want at the time we want it.  How 
do we do this?  The answer is a bunch of techniques which 
collectively fall under the heading of "search".  Search is a 
concept which permeates computer science.  We'll only touch 
on a couple of kinds of search in this course, but they'll be 
sufficient to demonstrate the basic difference between brute-
force, exhaustive, or "dumb" search and heuristic or 
"intelligent" search.

Linear search

You probably already know how to do a linear search.  You 
probably did linear searches in previous programming courses.  
For example, starting at the beginning of a file structure 
and looking at record after record for a specific entry is a 
linear search.  (If you've ever seen my office, you know that 
the only way I could find something in there is by linear 
search:  I start at one end of the desk and look at 
everything until I find what I'm looking for.)  Linear 
searches take a long time -- O(n), that kind of time.  
(Actually, assuming an even distribution of stuff in the 
file, you're looking at 1/2 * O(n), but the constants are 
more or less unimportant.)

We can impose a separate indexing scheme on our file 
structure, so that we can cut down on some search time.  For 
example, we could apply a binary search mechanism to look for 
an employee record in a file.  If the employee's name starts 
with a letter in the range A-M, we could start the search at 
the beginning of the file, but if the name starts with the 
letter N-Z, we would start the search at approximately the 
midway point in the file.  We could continue to divide the 
big groups into smaller groups, until eventually the time to 
find a single record is governed not by the behavior of the 
linear search but by the behavior of the binary search.  
There are other indexing mechanisms that we could use, such 
as hashing functions, that would give us different kinds of 
advantages.

Searching a hierarchical structure

As we discussed previously, we don't always store our stuff 
in linear formats.  We can also organize knowledge in hierarchies.  
Consider, for example, the Flintstone Family Tree:

                         Rocky
                         /   \
                        /     \
               has-mom /       \ has-dad
                      /         \
                    \/_         _\/
                 Pebbles       Bam-Bam
                 / \ has-dad       / \
        has-mom /   \     has-mom /   \ has-dad
               /     \           /     \
             \/_     _\/       \/_     _\/
            Wilma    Fred     Betty   Barney

In structures like this, as before, we may want to search for 
useful information.  But structures like this, unlike linear 
file structures, make it easier to search for the answers to 
questions like "What's the relationship of Barney to Rocky?" 
or "Who is Rocky's grandfather on his mother's side?"

Depth-first search

The simplest form of search in a hierarchical or network 
structure is called "depth-first search".  Here's an 
algorithm for depth-first search on a binary tree, looking 
for a specific node in the tree:

df-search

1.  look at the root
2.  if it's what you're looking for, then return success
3.  if the root has no descendants, then return failure
4.  call df-search on the subtree whose root is the leftmost
    descendant and return success if that search is 
    successful
5.  call df-search on the subtree whose root is the rightmost
    descendant and return success if that search is 
    successful

This algorithm may look somewhat familiar, since it's just 
a variant of the preorder tree traversal algorithm some of
you have seen in previous courses:

preorder

1.  visit the root
2.  call preorder on the left subtree
3.  call preorder on the right subtree

The big differences between the preorder algorithm and the 
depth-first search algorithm are these:

1.  depth-first search stops before searching the whole tree,
    if it finds what it's looking for; preorder traversal
    always examines the entire tree

2.  with depth-first search, searching the right subtree
    occurs only if the search of the left subtree failed to
    find what was being looked for; with preorder traversal,
    the right subtree is always explored (this is sort of a 
    corollary to the first difference listed just above)

How do you implement this in what is quickly becoming your
favorite programming language?  We'll talk about that on 
Tuesday.



Copyright 1997 by Kurt Eiselt.  All rights reserved.

Last revised: February 6, 1997