I. But first...Hey, that tree traversal program is horrible!
As noted last time, we could do a better job with that preorder tree
traversal. It could be very useful to come up with operations
associated with binary trees that would serve as a way of separating
the high-level algorithm from the low-level implementation details of
this particular binary tree. You could write the appropriate
functions, maybe called something like "get-root", "get-left-subtree",
and "get-right-subtree", which could be used by other functions to
carry out various manipulations or traversals of this tree. These
functions to get the left and right children of a node could abstract
away the details of the implementation and allow you to write your
tree manipulation without concern for the implementation of the data
structure:
(define (print-preorder tree)
(cond ((null? tree) ())
(else (write (get-root tree))
(print-preorder (get-left-subtree tree))
(print-preorder (get-right-subtree tree)))))
(define (get-root tree)
(car tree))
(define (get-left-subtree tree)
(cadr tree))
(define (get-right-subtree tree)
(caddr tree))
Now we've separated the algorithm from the implementation details.
The top-level function, print-preorder, reveals nothing about the
implementation of the tree itself. If someone comes along and says
that the implementation of a binary tree has changed from (root
left-subtree right-subtree) for example to (left-subtree
right-subtree root) for some dorky reason, then you know what needs
to be changed and what doesn't---the algorithm remains untouched, but
those itty bitty accessor functions need some fixing:
(define (print-preorder tree)
(cond ((null? tree) ())
(else (write (get-root tree))
(print-preorder (get-left-subtree tree))
(print-preorder (get-right-subtree tree)))))
(define (get-root tree)
(caddr tree))
(define (get-left-subtree tree)
(car tree))
(define (get-right-subtree tree)
(cadr tree))
> (print-preorder '(((() () d) (() () e) b) ((() () f) (() () g) c) a))
abdecfg
Similarly, if we want our function to do an inorder or postorder
traversal, we leave the low-level accessor functions unaltered and
tweak the higher-level algorithm:
inorder-tree-traversal
1. call inorder-tree-traversal on the left subtree of the root
2. visit the root of the tree and do what you're going to do
with it (print it, collect it, stop if it's what you're looking for,
whatever)
3. call inorder-tree-traversal on the right subtree of the root
(define (print-inorder tree)
(cond ((null? tree) ())
(else (print-inorder (get-left-subtree tree))
(write (get-root tree))
(print-inorder (get-right-subtree tree)))))
(define (get-root tree)
(car tree))
(define (get-left-subtree tree)
(cadr tree))
(define (get-right-subtree tree)
(caddr tree))
> (print-inorder '(a (b (d () ()) (e () ())) (c (f () ()) (g () ()))))
dbeafcg
postorder-tree-traversal
1. call postorder-tree-traversal on the left subtree of the root
2. call postorder-tree-traversal on the right subtree of the root
3. visit the root of the tree and do what you're going to do
with it (print it, collect it, stop if it's what you're looking for,
whatever)
(define (print-postorder tree)
(cond ((null? tree) ())
(else (print-postorder (get-left-subtree tree))
(print-postorder (get-right-subtree tree))
(write (get-root tree)))))
(define (get-root tree)
(car tree))
(define (get-left-subtree tree)
(cadr tree))
(define (get-right-subtree tree)
(caddr tree))
> (print-postorder '(a (b (d () ()) (e () ())) (c (f () ()) (g () ()))))
debfgca
And hey, you know what? Awhile back, one very insightful student
came up to me after class and challenged me on my proud
self-assessment of just how good my data abstraction really was. He
told me it wasn't that good. And when I asked why, he said that I
should have abstracted away the return of the empty list in the first
cond clause...testing for the empty list in the top-level function
left a data structure detail in there. He was absolutely right, and
as happens more often than you might believe, I've learned something
from my students. And now, so have you. Here's the even better
version:
(define (print-postorder tree)
(cond ((done? tree) ())
(else (print-postorder (get-left-subtree tree))
(print-postorder (get-right-subtree tree))
(write (get-root tree)))))
(define (get-root tree)
(car tree))
(define (get-left-subtree tree)
(cadr tree))
(define (get-right-subtree tree)
(caddr tree))
(define (done? tree)
(null? tree))
II. Data Abstraction Revisited
The exercise we just went through, that of using procedural
abstraction to put some distance between high-level algorithms and
low-level data structure implementation details, is a form of "data
abstraction". The general idea is the same as with procedural
abstraction: postpone worrying about the details.
Data abstraction is something you want to be practicing regularly. It
takes a little bit of thinking at first (what doesn't in this class?)
but it pays off big in the long run. But keep in mind this one truth:
the bulk of time and energy invested in the life of a major piece of
software is expended after the initial design and implementation in
efforts to debug, maintain, improve, and adapt the software. Putting
an "abstraction wall" between the data-accessing functions and
everything else confines code changes brought on by changes in data
structures to relatively few procedures, and it makes the whole thing
tremendously easier to read. If you get in the habit of doing this
now and carrying this practice through your professional software
career (at least for those of you who go on to professional software
careers), you will acquire the reputation of being a seriously
impressive software developer (and rightfully so).
You can get some measure of the goodness of your program by
evaluating what proportion of the code that deals with the data
structures actually depends on implementation details:
This represents the difference between a program that has used
abstraction to effectively postpone the details and one that doesn't.
Strive for the former, shun the latter.
III. Association lists
Out of simple lists in Scheme we can build another data abstraction,
called the association list. The association list, or a-list for
short, is just a list of two- (or more) element sublists. The first
element of the two-or-more element sublist represents a "key" or
"index" that allows access to the information in the sublist. The
second and any other elements of the sublist are the data associated
with the key. The idea here is that you view the a-list as a
repository of information, and you retrieve specific pieces of
information by passing the key to the repository through some
interface, and the interface returns the information associated with
that key.
Association lists go by other names depending on the language or the
application: dictionary, lookup table, and symbol table are other
ways of naming association lists. A-lists are very very useful data
abstractions. In fact, the Scheme system you're using probably uses
an a-list to bind function bodies to function names, so that when
Scheme sees a function name, it can then retrieve the function body
from the a-list (which in this case is called a symbol table).
Here's the graph level abstraction of a particular a-list:
Here's a Scheme-level abstraction of the same a-list:
((eiselt 1.9) (leahy 2.7) (greenlee 0.6))
And here's a lower-level, box-and-pointer version of the same a-list:
_______ _______ _______
| | | | | | | | /|
| | | --+-------------->| | | --+-------------->| | | / |
|_|_|___| |_|_|___| |_|_|/__|
| | |
| | |
\|/ \|/ \|/
_______ _______ _______ _______ _______ _______
| | | | | /| | | | | | /| | | | | | /|
| | | --+-->| | | / | | | | --+-->| | | / | | | | --+-->| | | / |
|_|_|___| |_|_|/__| |_|_|___| |_|_|/__| |_|_|___| |_|_|/__|
| | | | | |
eiselt 1.9 leahy 2.7 greenlee 0.6
What helps to make the association list or a-list so useful in Scheme
is the existence of a predefined a-list operation called "assoc". The
assoc function takes two arguments, a key and an a-list, and searches
down the a-list for the sublist whose first element matches the key:
> (assoc 'leahy '((eiselt 1.9)(leahy 2.7)(greenlee 0.6)))
(leahy 2.7)
Scheme's assoc function uses equal? as its comparison predicate.
There are other similar functions using different predicates. Thus,
assq uses eq?, and assv uses eqv?, which we didn't even talk about
before. Don't worry about that one for now.
Here's a way of implementing your own assoc function. At the very
least, it'll help you understand what assoc does:
(define (my-assoc key alist)
(cond ((null? alist) #f)
((equal? key (caar alist)) (car alist))
(else (my-assoc key (cdr alist)))))
What's caar? It's the car of the car. Scheme lets you compound the
a's and the d's, up to four, between the c and the r. So there's caar
and caaar and cddr and cadr and cadadr and so on. But there's no
caaaaar, because that's five letters between the c and the r. But you
could write your own, couldn't you?
And how do you pronounce those names? It's an acquired skill. Trust me.
There's a rudimentary pronunciation guide in your textbook.
Association lists in Scheme don't provide especially fast response
when you use assoc or its relatives to look for something. Finding
some key in an association list of N-elements has O(N) time
complexity, because it involves starting at the beginning of the
a-list and looking at every element until the key is matched. But
a-lists provide us with the versatility to implement all sorts of
dynamic linked-list data structures of our own design, as we'll see
in the near future. The important thing to note about a-lists for now
is that they are a useful abstract data type that provides us with
the ability to a very important task--looking up some chunk of
related information (sometimes called a record) in some file, table,
or other such data structure given a key or index--without regard to
how it's implemented.
IV. Representing trees as A-lists
There's more than one way to represent any collection of knowledge.
And the reason we're telling you about the association list now
is so we can demonstrate multiple ways of representing the same thing.
For example, we could represent the binary tree of our previous
example as an association list. Here's that binary tree again:
(a (b (d () ()) (e () ())) (c (f () ()) (g () ())))
And here is that same tree encoded with an A-list:
((a (b c))
(b (d e))
(c (f g))
(d (() ()))
(e (() ()))
(f (() ()))
(g (() ())) )
In the a-list, the first element of each sublist is a node in the
tree, and the next element is a list of the children of that node. We
use () to indicate that there is no child. It's not absolutely
necessary that we do it that way...it's just a standard convention
that is consistent with what we've done in the past.
V. Traversing a tree represented as an A-list
How do we do a traversal of a tree represented as an association
list? Well, we're going to have to develop this a little bit differently
than we did previously, but the principles are the same.
Here's the old code:
(define (print-preorder tree)
(cond ((null? tree) ())
(else (write (get-root tree))
(print-preorder (get-left-subtree tree))
(print-preorder (get-right-subtree tree))))
(define (get-root tree)
(car tree))
(define (get-left-subtree tree)
(cadr tree))
(define (get-right-subtree tree)
(caddr tree))
On our first attempt, we're going to have to tweak all of this
code a little bit, not just the lower levels, because of the use of
the A-list, but we don't have to tweak it much. Note that in the
nested list representation:
(a (b (d () ()) (e () ())) (c (f () ()) (g () ())))
the root node of the tree is defined as being the first element of a
three-element list. So whatever is the root is defined by its position
in the list, right? But this assumption doesn't hold true for the
association list. That is, the root of the tree in this structure:
((a (b c))
(b (d e))
(c (f g))
(d (() ()))
(e (() ()))
(f (() ()))
(g (() ())))
isn't necessarily the first thing in that A-list. I could change
things around like this:
((d (() ()))
(a (b c))
(b (d e))
(c (f g))
(e (() ()))
(f (() ()))
(g (() ())))
and my binary tree hasn't changed. (If you don't believe it, draw a
little tree diagram for with circles and arrows for both A-lists and
see for yourself that they're the same.)
The upshot of this observation is that to get things started, I have to
tell my preorder function not only what the tree looks like, as I did
before, but I have to be very explicit about where the root node is,
and that's something I didn't have to do before. The program won't be
able to figure that out for itself. So I'm going to have to add an
extra parameter to accommodate root information to go along with tree
information, and that makes for somewhat less data abstraction than I
wanted, but somehow I have to let the program know where the root of
the whole tree is. So everywhere I passed "tree" before, I should
somehow pass "root" too. Got it so far? Nah, I didn't think so. So let
me take you through the top-level procedure and show you the hows and
whys of the changes:
First I add the "root" arg to the input argument list so I can tell the
procedure which node is the root of the tree:
(define (print-preorder tree root)
How will print-preorder know when it's done? Testing to see if the
alist is empty won't work, because we're never making the alist
smaller... it's not like we're scooting down the alist and reducing
the problem. However, we can assume that two sorts of values can be
passed via the root parameter to print-preorder: either a numeric
value that's supposed to be an a key or index to the root of some
subtree, or () to indicate that there is no subtree. So if we test
root to see if it's the empty list, the () tells us that the preorder
traversal has hit a dead end. But we also want to abstract that
detail away, so we write this here:
(cond [(done? tree root) ()]
and remember to add this function later:
(define (done? tree root)
(null? root))
Again I add the root argument in the call to "get-root":
[else (write (get-root tree root))
Now we get to the hard parts. They're not really that hard once you
get comfortable with the a-list thing. First off, I know that I want
to pass the root info to "get-left-subtree" and "get-right-subtree",
so that's pretty easy. But in the previous (non a-list) version of
this procedure, "get-left-subtree" and "get-right-subtree" returned
entire sub-chunks of the tree, right? Would that be easy to do using
an a-list, given that the order of things in the a-list doesn't mean
what it used to? No. So instead of returning whole subtrees, what
should those functions return? If you said "the pointer to the root of
the subtree" then you get it. If you said something else, then go back
and keep rereading this until "the pointer to the root of the subtree"
makes sense as the right answer. So I modify the calls to
"get-left-subtree" and "get-right-subtree" as noted previously. If
those function calls return the root of the subtree, then I want to
pass the result of those function calls as the root argument to my
"print-preorder-alist" function, and I also have to pass the tree
itself, so the next two expressions end up looking like this:
(print-preorder tree
(get-left-subtree tree root))
(print-preorder tree
(get-right-subtree tree root))
So now the "print-preorder" function now looks like this:
(define (print-preorder tree root)
(cond [(done? tree root) ()]
[else (write (get-root tree root))
(print-preorder tree
(get-left-subtree tree root))
(print-preorder tree
(get-right-subtree tree root))]))
What about the details of accessing the data structure? Well, as we
hinted at above, if you pass the root to "get-root", "get-root"
doesn't have to do much of anything but return what it was passed.
Here's a roundabout way of doing just that:
(define (get-root tree root)
(car (assoc root tree)))
Then to get the roots of the left and right subtrees, you apply the
correct combination of cars and cdrs to get at the information you need.
(You may want to trace the behavior of all this stuff by hand if you don't
understand why the two functions below look the way they do.):
(define (get-left-subtree tree root)
(caadr (assoc root tree)))
(define (get-right-subtree tree root)
(cadadr (assoc root tree)))
And let's not forget this:
(define (done? tree root)
(null? root))
If you pass our new "print-preorder" function the tiny tree that we
implemented as an a-list along with the symbol 'A which is the root of
this tree, you'll see a nice preorder tree traversal, assuming I didn't
introduce any bugs into this program.
> (define x '((a (b c)) (b (d e)) (c (f g)) (d (() ())) (e (() ()))
(f (() ())) (g (() ()))))
> (print-preorder x 'a)
abdecfg
()
>
Why is it so important to have this ability to traverse a binary tree that's
implemented as an a-list? That little exercise above gives us a clue as to
how to implement an incredibly useful if somewhat disorganized (relatively
speaking) data structure called the "relational network". More about those
in the near future.
But there's a problem even with this representation. What if "c" is in the
tree twice? For example:
((a (b c))
(b (c e))
(c (f g))
(c (() ()))
(e (() ()))
(f (() ()))
(g (() ())) )
Our older, nested-list representation of this tree would be:
(a (b (c () ()) (e () ())) (c (f () ()) (g () ())))
^ ^
once twice
That could happen. Two different things in a data structure could
have the same name. But given the way assoc works, in that it always
starts looking for a key from the front of the a-list, assoc will
never find the second entry for "c" in the a-list, even though
a traversal of the nested-list representation would find both
instances of "c". So the a-list now doesn't accurately reflect the
intended structure of the tree.
In general, when you're storing information--records--in a file or
list structure like what we're doing here, you need a unique key or
index for every record. We didn't do that here. But in the real
world, there are lots of people with the same name, and if you used
the name itself, which is contained in the record, as the key or
index to find that record, you'll end up with all sorts of confusion.
That's why you'll see a bunch of gobbledygook on the mailing labels
of magazines you might subscribe to, for example. In that
gobbledygook, you might recognize a few letters of your name, a chunk
of your street address, part of your zip code, and who knows what
else. That's what the publisher uses as the index to find your
subscription information...it's computed from a set formula, and it
helps resolve the ambiguities that are caused by people having the
same name.
So what can we do in our example to fix this problem? We could invent
unique keys that are associated with each of the nodes, like this
(we'll just pick numbers for the heck of it):
((100 (a 101 102))
(101 (b 103 104))
(102 (c 105 106))
(103 (d () ()))
(104 (e () ()))
(105 (f () ()))
(106 (g () ())) )
Above is the a-list where there are no duplicate nodes, and below
would be the example with duplicate nodes:
((100 (a 101 102))
(101 (b 103 104))
(102 (c 105 106))
(103 (c () ()))
(104 (e () ()))
(105 (f () ()))
(106 (g () ())) )
Do we need (103 (c () ())) etc. in this list? Or should it be
(103 (c 0 0))? The answer is that it doesn't matter as long as we
abstract the problem sufficiently. As long as there is some code which
implements "get-left-subtree" and "get-right-subtree" it is not
important to the higher layers of the software whether we put the
(c () ()) in the list or (c 0 0) instead. We just deal with the
get-child functions and depend on them to do their job.
Can we adapt the preorder tree traversal code from just above to
traverse this new representation scheme for a binary tree embedded
in an A-list? But of course.
VI. Traversing a tree represented as an A-list, part two
Here's the old code:
(define (print-preorder tree root)
(cond [(done? tree root) ()]
[else (write (get-root tree root))
(print-preorder tree
(get-left-subtree tree root))
(print-preorder tree
(get-right-subtree tree root))]))
(define (get-left-subtree tree root)
(caadr (assoc root tree)))
(define (get-right-subtree tree root)
(cadadr (assoc root tree)))
(define (done? tree root)
(null? root))
(define (get-root tree root)
(car (assoc root tree)))
Because we've done such a good job of abstracting away the details
of the implementation of the tree structure, we only have to change
some of the tiny little accessor functions down there at the bottom.
But first, let me add another itty bitty abstraction that looks like
this:
(define (get-node tree address)
(cadr (assoc address tree)))
What's get-node? That's just a function I threw in to make things
prettier, so I don't have to write the same code over and over.
In other words, I'm going to be looking at things like this a lot:
(101 (b 103 104))
and extracting this:
(b 103 104)
I'm just giving a name to the function that separates the information
about some node in the tree from the arbitrary number with which it is
associated. This is going to let me avoid having to try to write overly
long and unreadable function names like caddadr, which Scheme doesn't
accept anyway. So now I can redefine get-root like this:
(define (get-root tree address)
(car (get-node tree address)))
Then to get the roots of the left and right subtrees, you use the magic
function get-node and once again apply the correct combination of cars and
cdrs:
(define (get-left-subtree tree root)
(cadr (get-node tree root)))
(define (get-right-subtree tree root)
(caddr (get-node tree root)))
And now, if you pass this newer "print-preorder" function the newer tree
that we implemented as an a-list along with the address 100 which is the
root of this tree, you'll yet another in a seemingly interminable
series of preorder tree traversals:
> (define x ' ((100 (a 101 102)) (101 (b 103 104)) (102 (c 105 106))
(103 (d () ())) (104 (e () ())) (105 (f () ()))
(106 (g () ())) ))
> (print-preorder x 100)
abdecfg
()
>
VII. Another abstraction exercise
Does the a-list above have to look like this?:
((100 (a 101 102))
(101 (b 103 104))
(102 (c 105 106))
(103 (d () ()))
(104 (e () ()))
(105 (f () ()))
(106 (g () ())))
No. It could just as easily look like this:
((100 a 101 102)
(101 b 103 104)
(102 c 105 106)
(103 d () ())
(104 e () ())
(105 f () ())
(106 g () ()))
That was just a choice on our part. No big deal, since my clever use of
data abstraction allows me to isolate and change the appropriate
functions as the details of my data structure change. In fact, all I
should have to change is get-node...I'll let you figure out what that
change should be. Isn't abstraction wonderful?
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: October 10, 2003