CS 1321X - Lecture 14 - October 2, 2003

CS 1321X - Lecture 14

Trees as Association Lists


I. But first...Hey, that tree traversal program is horrible!

As noted last time, we could do a better job with that preorder tree 
traversal. It could be very useful to come up with operations 
associated with binary trees that would serve as a way of separating 
the high-level algorithm from the low-level implementation details of 
this particular binary tree. You could write the appropriate 
functions, maybe called something like "get-root", "get-left-subtree", 
and "get-right-subtree", which could be used by other functions to 
carry out various manipulations or traversals of this tree. These 
functions to get the left and right children of a node could abstract 
away the details of the implementation and allow you to write your 
tree manipulation without concern for the implementation of the data 
structure:

(define (print-preorder tree)
    (cond ((null? tree) ())
          (else (write (get-root tree))
                (print-preorder (get-left-subtree tree))
                (print-preorder (get-right-subtree tree)))))

(define (get-root tree)
    (car tree))

(define (get-left-subtree tree)
    (cadr tree))

(define (get-right-subtree tree)
    (caddr tree))


Now we've separated the algorithm from the implementation details. 
The top-level function, print-preorder, reveals nothing about the 
implementation of the tree itself. If someone comes along and says 
that the implementation of a binary tree has changed from (root 
left-subtree right-subtree) for example to (left-subtree 
right-subtree root) for some dorky reason, then you know what needs 
to be changed and what doesn't---the algorithm remains untouched, but
those itty bitty accessor functions need some fixing:


(define (print-preorder tree)
    (cond ((null? tree) ())
          (else (write (get-root tree))
                (print-preorder (get-left-subtree tree))
                (print-preorder (get-right-subtree tree)))))

(define (get-root tree)
    (caddr tree))

(define (get-left-subtree tree)
    (car tree))

(define (get-right-subtree tree)
    (cadr tree))


>  (print-preorder '(((() () d) (() () e) b) ((() () f) (() () g) c) a))
abdecfg


Similarly, if we want our function to do an inorder or postorder 
traversal, we leave the low-level accessor functions unaltered and 
tweak the higher-level algorithm:

inorder-tree-traversal

  1.  call inorder-tree-traversal on the left subtree of the root

  2.  visit the root of the tree and do what you're going to do 
      with it (print it, collect it, stop if it's what you're looking for, 
      whatever)

  3.  call inorder-tree-traversal on the right subtree of the root


(define (print-inorder tree)
    (cond ((null? tree) ())
          (else (print-inorder (get-left-subtree tree))
                (write (get-root tree))
                (print-inorder (get-right-subtree tree)))))

(define (get-root tree)
    (car tree))

(define (get-left-subtree tree)
    (cadr tree))

(define (get-right-subtree tree)
    (caddr tree))



>  (print-inorder '(a (b (d () ()) (e () ())) (c (f () ()) (g () ()))))
dbeafcg


postorder-tree-traversal

  1.  call postorder-tree-traversal on the left subtree of the root

  2.  call postorder-tree-traversal on the right subtree of the root

  3.  visit the root of the tree and do what you're going to do 
      with it (print it, collect it, stop if it's what you're looking for, 
      whatever)


(define (print-postorder tree)
    (cond ((null? tree) ())
          (else (print-postorder (get-left-subtree tree))
                (print-postorder (get-right-subtree tree))
                (write (get-root tree)))))

(define (get-root tree)
    (car tree))

(define (get-left-subtree tree)
    (cadr tree))

(define (get-right-subtree tree)
    (caddr tree))


>  (print-postorder '(a (b (d () ()) (e () ())) (c (f () ()) (g () ()))))
debfgca


And hey, you know what?  Awhile back, one very insightful student 
came up to me after class and challenged me on my proud 
self-assessment of just how good my data abstraction really was.  He 
told me it wasn't that good.  And when I asked why, he said that I 
should have abstracted away the return of the empty list in the first 
cond clause...testing for the empty list in the top-level function 
left a data structure detail in there. He was absolutely right, and 
as happens more often than you might believe, I've learned something 
from my students. And now, so have you. Here's the even better 
version:


(define (print-postorder tree)
    (cond ((done? tree) ())
          (else (print-postorder (get-left-subtree tree))
                (print-postorder (get-right-subtree tree))
                (write (get-root tree)))))

(define (get-root tree)
    (car tree))

(define (get-left-subtree tree)
    (cadr tree))

(define (get-right-subtree tree)
    (caddr tree))

(define (done? tree)
    (null? tree))


II.  Data Abstraction Revisited

The exercise we just went through, that of using procedural 
abstraction to put some distance between high-level algorithms and 
low-level data structure implementation details, is a form of "data 
abstraction". The general idea is the same as with procedural 
abstraction: postpone worrying about the details.

Data abstraction is something you want to be practicing regularly. It 
takes a little bit of thinking at first (what doesn't in this class?) 
but it pays off big in the long run. But keep in mind this one truth: 
the bulk of time and energy invested in the life of a major piece of 
software is expended after the initial design and implementation in 
efforts to debug, maintain, improve, and adapt the software. Putting 
an "abstraction wall" between the data-accessing functions and 
everything else confines code changes brought on by changes in data 
structures to relatively few procedures, and it makes the whole thing 
tremendously easier to read. If you get in the habit of doing this 
now and carrying this practice through your professional software 
career (at least for those of you who go on to professional software 
careers), you will acquire the reputation of being a seriously 
impressive software developer (and rightfully so).

You can get some measure of the goodness of your program by 
evaluating what proportion of the code that deals with the data 
structures actually depends on implementation details:

This represents the difference between a program that has used abstraction to effectively postpone the details and one that doesn't. Strive for the former, shun the latter. III. Association lists Out of simple lists in Scheme we can build another data abstraction, called the association list. The association list, or a-list for short, is just a list of two- (or more) element sublists. The first element of the two-or-more element sublist represents a "key" or "index" that allows access to the information in the sublist. The second and any other elements of the sublist are the data associated with the key. The idea here is that you view the a-list as a repository of information, and you retrieve specific pieces of information by passing the key to the repository through some interface, and the interface returns the information associated with that key. Association lists go by other names depending on the language or the application: dictionary, lookup table, and symbol table are other ways of naming association lists. A-lists are very very useful data abstractions. In fact, the Scheme system you're using probably uses an a-list to bind function bodies to function names, so that when Scheme sees a function name, it can then retrieve the function body from the a-list (which in this case is called a symbol table). Here's the graph level abstraction of a particular a-list:

Here's a Scheme-level abstraction of the same a-list: ((eiselt 1.9) (leahy 2.7) (greenlee 0.6)) And here's a lower-level, box-and-pointer version of the same a-list: _______ _______ _______ | | | | | | | | /| | | | --+-------------->| | | --+-------------->| | | / | |_|_|___| |_|_|___| |_|_|/__| | | | | | | \|/ \|/ \|/ _______ _______ _______ _______ _______ _______ | | | | | /| | | | | | /| | | | | | /| | | | --+-->| | | / | | | | --+-->| | | / | | | | --+-->| | | / | |_|_|___| |_|_|/__| |_|_|___| |_|_|/__| |_|_|___| |_|_|/__| | | | | | | eiselt 1.9 leahy 2.7 greenlee 0.6 What helps to make the association list or a-list so useful in Scheme is the existence of a predefined a-list operation called "assoc". The assoc function takes two arguments, a key and an a-list, and searches down the a-list for the sublist whose first element matches the key: > (assoc 'leahy '((eiselt 1.9)(leahy 2.7)(greenlee 0.6))) (leahy 2.7) Scheme's assoc function uses equal? as its comparison predicate. There are other similar functions using different predicates. Thus, assq uses eq?, and assv uses eqv?, which we didn't even talk about before. Don't worry about that one for now. Here's a way of implementing your own assoc function. At the very least, it'll help you understand what assoc does: (define (my-assoc key alist) (cond ((null? alist) #f) ((equal? key (caar alist)) (car alist)) (else (my-assoc key (cdr alist))))) What's caar? It's the car of the car. Scheme lets you compound the a's and the d's, up to four, between the c and the r. So there's caar and caaar and cddr and cadr and cadadr and so on. But there's no caaaaar, because that's five letters between the c and the r. But you could write your own, couldn't you? And how do you pronounce those names? It's an acquired skill. Trust me. There's a rudimentary pronunciation guide in your textbook. Association lists in Scheme don't provide especially fast response when you use assoc or its relatives to look for something. Finding some key in an association list of N-elements has O(N) time complexity, because it involves starting at the beginning of the a-list and looking at every element until the key is matched. But a-lists provide us with the versatility to implement all sorts of dynamic linked-list data structures of our own design, as we'll see in the near future. The important thing to note about a-lists for now is that they are a useful abstract data type that provides us with the ability to a very important task--looking up some chunk of related information (sometimes called a record) in some file, table, or other such data structure given a key or index--without regard to how it's implemented. IV. Representing trees as A-lists There's more than one way to represent any collection of knowledge. And the reason we're telling you about the association list now is so we can demonstrate multiple ways of representing the same thing. For example, we could represent the binary tree of our previous example as an association list. Here's that binary tree again: (a (b (d () ()) (e () ())) (c (f () ()) (g () ()))) And here is that same tree encoded with an A-list: ((a (b c)) (b (d e)) (c (f g)) (d (() ())) (e (() ())) (f (() ())) (g (() ())) ) In the a-list, the first element of each sublist is a node in the tree, and the next element is a list of the children of that node. We use () to indicate that there is no child. It's not absolutely necessary that we do it that way...it's just a standard convention that is consistent with what we've done in the past. V. Traversing a tree represented as an A-list How do we do a traversal of a tree represented as an association list? Well, we're going to have to develop this a little bit differently than we did previously, but the principles are the same. Here's the old code: (define (print-preorder tree) (cond ((null? tree) ()) (else (write (get-root tree)) (print-preorder (get-left-subtree tree)) (print-preorder (get-right-subtree tree)))) (define (get-root tree) (car tree)) (define (get-left-subtree tree) (cadr tree)) (define (get-right-subtree tree) (caddr tree)) On our first attempt, we're going to have to tweak all of this code a little bit, not just the lower levels, because of the use of the A-list, but we don't have to tweak it much. Note that in the nested list representation: (a (b (d () ()) (e () ())) (c (f () ()) (g () ()))) the root node of the tree is defined as being the first element of a three-element list. So whatever is the root is defined by its position in the list, right? But this assumption doesn't hold true for the association list. That is, the root of the tree in this structure: ((a (b c)) (b (d e)) (c (f g)) (d (() ())) (e (() ())) (f (() ())) (g (() ()))) isn't necessarily the first thing in that A-list. I could change things around like this: ((d (() ())) (a (b c)) (b (d e)) (c (f g)) (e (() ())) (f (() ())) (g (() ()))) and my binary tree hasn't changed. (If you don't believe it, draw a little tree diagram for with circles and arrows for both A-lists and see for yourself that they're the same.) The upshot of this observation is that to get things started, I have to tell my preorder function not only what the tree looks like, as I did before, but I have to be very explicit about where the root node is, and that's something I didn't have to do before. The program won't be able to figure that out for itself. So I'm going to have to add an extra parameter to accommodate root information to go along with tree information, and that makes for somewhat less data abstraction than I wanted, but somehow I have to let the program know where the root of the whole tree is. So everywhere I passed "tree" before, I should somehow pass "root" too. Got it so far? Nah, I didn't think so. So let me take you through the top-level procedure and show you the hows and whys of the changes: First I add the "root" arg to the input argument list so I can tell the procedure which node is the root of the tree: (define (print-preorder tree root) How will print-preorder know when it's done? Testing to see if the alist is empty won't work, because we're never making the alist smaller... it's not like we're scooting down the alist and reducing the problem. However, we can assume that two sorts of values can be passed via the root parameter to print-preorder: either a numeric value that's supposed to be an a key or index to the root of some subtree, or () to indicate that there is no subtree. So if we test root to see if it's the empty list, the () tells us that the preorder traversal has hit a dead end. But we also want to abstract that detail away, so we write this here: (cond [(done? tree root) ()] and remember to add this function later: (define (done? tree root) (null? root)) Again I add the root argument in the call to "get-root": [else (write (get-root tree root)) Now we get to the hard parts. They're not really that hard once you get comfortable with the a-list thing. First off, I know that I want to pass the root info to "get-left-subtree" and "get-right-subtree", so that's pretty easy. But in the previous (non a-list) version of this procedure, "get-left-subtree" and "get-right-subtree" returned entire sub-chunks of the tree, right? Would that be easy to do using an a-list, given that the order of things in the a-list doesn't mean what it used to? No. So instead of returning whole subtrees, what should those functions return? If you said "the pointer to the root of the subtree" then you get it. If you said something else, then go back and keep rereading this until "the pointer to the root of the subtree" makes sense as the right answer. So I modify the calls to "get-left-subtree" and "get-right-subtree" as noted previously. If those function calls return the root of the subtree, then I want to pass the result of those function calls as the root argument to my "print-preorder-alist" function, and I also have to pass the tree itself, so the next two expressions end up looking like this: (print-preorder tree (get-left-subtree tree root)) (print-preorder tree (get-right-subtree tree root)) So now the "print-preorder" function now looks like this: (define (print-preorder tree root) (cond [(done? tree root) ()] [else (write (get-root tree root)) (print-preorder tree (get-left-subtree tree root)) (print-preorder tree (get-right-subtree tree root))])) What about the details of accessing the data structure? Well, as we hinted at above, if you pass the root to "get-root", "get-root" doesn't have to do much of anything but return what it was passed. Here's a roundabout way of doing just that: (define (get-root tree root) (car (assoc root tree))) Then to get the roots of the left and right subtrees, you apply the correct combination of cars and cdrs to get at the information you need. (You may want to trace the behavior of all this stuff by hand if you don't understand why the two functions below look the way they do.): (define (get-left-subtree tree root) (caadr (assoc root tree))) (define (get-right-subtree tree root) (cadadr (assoc root tree))) And let's not forget this: (define (done? tree root) (null? root)) If you pass our new "print-preorder" function the tiny tree that we implemented as an a-list along with the symbol 'A which is the root of this tree, you'll see a nice preorder tree traversal, assuming I didn't introduce any bugs into this program. > (define x '((a (b c)) (b (d e)) (c (f g)) (d (() ())) (e (() ())) (f (() ())) (g (() ())))) > (print-preorder x 'a) abdecfg () > Why is it so important to have this ability to traverse a binary tree that's implemented as an a-list? That little exercise above gives us a clue as to how to implement an incredibly useful if somewhat disorganized (relatively speaking) data structure called the "relational network". More about those in the near future. But there's a problem even with this representation. What if "c" is in the tree twice? For example: ((a (b c)) (b (c e)) (c (f g)) (c (() ())) (e (() ())) (f (() ())) (g (() ())) ) Our older, nested-list representation of this tree would be: (a (b (c () ()) (e () ())) (c (f () ()) (g () ()))) ^ ^ once twice That could happen. Two different things in a data structure could have the same name. But given the way assoc works, in that it always starts looking for a key from the front of the a-list, assoc will never find the second entry for "c" in the a-list, even though a traversal of the nested-list representation would find both instances of "c". So the a-list now doesn't accurately reflect the intended structure of the tree. In general, when you're storing information--records--in a file or list structure like what we're doing here, you need a unique key or index for every record. We didn't do that here. But in the real world, there are lots of people with the same name, and if you used the name itself, which is contained in the record, as the key or index to find that record, you'll end up with all sorts of confusion. That's why you'll see a bunch of gobbledygook on the mailing labels of magazines you might subscribe to, for example. In that gobbledygook, you might recognize a few letters of your name, a chunk of your street address, part of your zip code, and who knows what else. That's what the publisher uses as the index to find your subscription information...it's computed from a set formula, and it helps resolve the ambiguities that are caused by people having the same name. So what can we do in our example to fix this problem? We could invent unique keys that are associated with each of the nodes, like this (we'll just pick numbers for the heck of it): ((100 (a 101 102)) (101 (b 103 104)) (102 (c 105 106)) (103 (d () ())) (104 (e () ())) (105 (f () ())) (106 (g () ())) ) Above is the a-list where there are no duplicate nodes, and below would be the example with duplicate nodes: ((100 (a 101 102)) (101 (b 103 104)) (102 (c 105 106)) (103 (c () ())) (104 (e () ())) (105 (f () ())) (106 (g () ())) ) Do we need (103 (c () ())) etc. in this list? Or should it be (103 (c 0 0))? The answer is that it doesn't matter as long as we abstract the problem sufficiently. As long as there is some code which implements "get-left-subtree" and "get-right-subtree" it is not important to the higher layers of the software whether we put the (c () ()) in the list or (c 0 0) instead. We just deal with the get-child functions and depend on them to do their job. Can we adapt the preorder tree traversal code from just above to traverse this new representation scheme for a binary tree embedded in an A-list? But of course. VI. Traversing a tree represented as an A-list, part two Here's the old code: (define (print-preorder tree root) (cond [(done? tree root) ()] [else (write (get-root tree root)) (print-preorder tree (get-left-subtree tree root)) (print-preorder tree (get-right-subtree tree root))])) (define (get-left-subtree tree root) (caadr (assoc root tree))) (define (get-right-subtree tree root) (cadadr (assoc root tree))) (define (done? tree root) (null? root)) (define (get-root tree root) (car (assoc root tree))) Because we've done such a good job of abstracting away the details of the implementation of the tree structure, we only have to change some of the tiny little accessor functions down there at the bottom. But first, let me add another itty bitty abstraction that looks like this: (define (get-node tree address) (cadr (assoc address tree))) What's get-node? That's just a function I threw in to make things prettier, so I don't have to write the same code over and over. In other words, I'm going to be looking at things like this a lot: (101 (b 103 104)) and extracting this: (b 103 104) I'm just giving a name to the function that separates the information about some node in the tree from the arbitrary number with which it is associated. This is going to let me avoid having to try to write overly long and unreadable function names like caddadr, which Scheme doesn't accept anyway. So now I can redefine get-root like this: (define (get-root tree address) (car (get-node tree address))) Then to get the roots of the left and right subtrees, you use the magic function get-node and once again apply the correct combination of cars and cdrs: (define (get-left-subtree tree root) (cadr (get-node tree root))) (define (get-right-subtree tree root) (caddr (get-node tree root))) And now, if you pass this newer "print-preorder" function the newer tree that we implemented as an a-list along with the address 100 which is the root of this tree, you'll yet another in a seemingly interminable series of preorder tree traversals: > (define x ' ((100 (a 101 102)) (101 (b 103 104)) (102 (c 105 106)) (103 (d () ())) (104 (e () ())) (105 (f () ())) (106 (g () ())) )) > (print-preorder x 100) abdecfg () > VII. Another abstraction exercise Does the a-list above have to look like this?: ((100 (a 101 102)) (101 (b 103 104)) (102 (c 105 106)) (103 (d () ())) (104 (e () ())) (105 (f () ())) (106 (g () ()))) No. It could just as easily look like this: ((100 a 101 102) (101 b 103 104) (102 c 105 106) (103 d () ()) (104 e () ()) (105 f () ()) (106 g () ())) That was just a choice on our part. No big deal, since my clever use of data abstraction allows me to isolate and change the appropriate functions as the details of my data structure change. In fact, all I should have to change is get-node...I'll let you figure out what that change should be. Isn't abstraction wonderful? Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with the exception of stuff that belongs to somebody else.

Last revised: October 10, 2003