I. Data abstraction Now you should be somewhat familiar with the basic ideas behind abstraction. And the type of abstraction that we've been dealing with mostly is called procedural abstraction. That is, in building our procedures, we focus on: Postponing worrying about the details Decomposition Putting as much distance as possible between the high-level conceptual ideas of what we're trying to do and the details of how it gets done That final point is very important. What it says is that, in the procedures we build, we're trying to separate the theory, the design, the algorithm, etc., from the low-level implementation stuff as much as we can. Why is this good for us? It aids in designability, maintainability, adaptability, readability, debuggability, and all the other 'itys. But we also know that it's painful. Why? Let's face it...we'd rather just slam-dunk the code, and all this abstraction stuff is hard to think about if we're not used to it. Nevertheless, the positives outweigh the negatives here, so we continue abstracting away. We get similar benefits when we abstract away the details of our data structures. For example, we already know that in Scheme we can build linked-list data structures easily without worrying about the details of how those structures are implemented. And that makes our programming lives easier. Those same simple tasks could be a lot more difficult in other programming languages; Scheme has done some of the abstraction for us. So this puts us into another, but related, world of abstraction that's called data abstraction. We want to be able to write our programs in ways that focus on the high-level concepts about the data while abstracting away the implementation details of the data structures. How important is this data abstraction thing? Put it this way: if, for the past 30 or 40 years, programmers had bothered to abstract away the details of how dates are represented from the higher-level procedures that used dates in important computations, the task of fixing all the programs in the entire world by January 1, 2000 would have been a whole lot easier, saving billions of dollars, and millions of hours of effort. That's how important data abstraction is. When talking and thinking about programming at the higher levels, it's easier to work with terms, concepts, and ideas that are neither language nor computer dependent. Doing this successfully requires the use of existing abstract data types provided by the language or the creation of your own abstract data types from existing ones. Remember, an ADT is a logical (not they way it's physically represented, but more, um, abstract) description of a data structure plus operations that are specific to that data structure. Those data-structure-specific operations form the interface between whatever uses the data structure and the implementation details of the data structure itself. So getting those operations right is a large part of doing the data abstraction right. Today we'll be talking about dynamic linked-list data structures. They're dynamic, because they can change in size and shape as they're being used. That dynamism provides a nice degree of flexibility, and that's generally good, but there's a cost: extra information must be stored in a dynamic data structure to keep track of how it's supposed to be organized, and that takes up extra memory and uses up extra time. Other sorts of data structures that we'll talk about in weeks to come (with names like arrays and vectors) are static in that they don't typically change size and shape, so there's less bookkeeping overhead, but there's also less flexibility for the programmer. But that's an issue we'll deal with later too. For now, let's start with a quick introduction to the granddaddy of abstractions, the graph. II. Graphs -- an abstraction of abstractions There's an abstraction that computer science types use all the time. It's called a graph, and you may have seen graphs before. You just didn't know that their technical name is "graph"...it's a math thing. A graph is just a collection of vertices and edges. Sometimes the vertices are called nodes, and sometimes the edges are called links or arcs. Any way you look at it, it's all the same. Here's a graphical representation of a graph:
![]()
Since we're going to use graphs as a means of describing data structures, it would be nice to have a way to represent stored data in a graph. Computer science folks do this by drawing circles at the vertices and writing the information in those circles. That's another abstraction, because that's not how it really looks in the computer's memory, but it makes it easier for us to think about and play with:
![]()
And if we put orientations on the edges, we get something called a directed graph. Those orientations, which we denote with arrowheads, give us more information about relationships between the pieces of data in the vertices, like the order that's imposed by the links between cons cells:
![]()
This directed graph notation gives us an abstract representation of a linked list in Scheme, like '(a b c), which as you recall is a previously-defined abstract data type in Scheme. (I hope you recall it...you've been using linked lists for weeks!)
![]()
(a b c) _______ _______ _______ | | | | | | | | /| | | | --+------------------>| | | --+-------------->| | | / | |_|_|___| |_|_|___| |_|_|/__| | | | | | | \|/ \|/ \|/ a b c III. Trees Computer scientists use a data structure called a "tree" quite often. A tree is a hierarchical linked-list structure. In directed-graph terms, a tree is a single node (or vertex), called the "root" of the tree, with directed links (or edges) to zero or more nodes (or vertices), each of which is the root of a tree. There are no cycles or loopbacks in trees, so by following the directed links, there is only one path from one node to another in a tree. Here's a directed graph abstraction of a tree:
![]()
This particular example also happens to be an example of a binary tree, in which each node has directed links emanating from it to no more than two other nodes. We can represent this tree as a Scheme list: (a (b (d () ()) (e () ())) (c (f () ()) (g () ()))) At the top level, this tree consists of three elements: the symbol a stored at the root, the left subtree, and the right subtree. The left subtree in turn consists of three elements: the symbol b, the left subtree of b, and the right subtree of b. And so on and so on. When a node doesn't have a left subtree or a right subtree, we just put () where the subtee would go. You could map this tree onto box-and-pointer notation to see the "treeness" of it all, but I'll leave that as an exercise for you. Admittedly, this is an unwieldy sort of representation for a tree, and if you're thinking ahead, you might have already concluded that we might get more flexibility if we mapped this tree onto our a-list representation. You'd be right, but we're getting ahead of ourselves. This representation will serve us just fine for now. As mentioned earlier, trees are hierarchical data structures, so we use them to organize chunks of information that have hierarchical relationships between the chunks. Obviously, we organize this stuff so that we can find it when we need it, and there are some standard ways of searching trees, some efficient and some not so efficient, that we'll see in the near future. Let's start with the inefficient ways so we know what they look like, and then we can work to make them better. One way to find something in a tree is to look at every node in the tree and see if the thing you're looking for is there. The process of looking at every node in a tree is called tree traversal, and there are different ways to traverse a tree. One specific way of looking at everything is called preorder tree traversal. The high-level algorithm, which is recursive, of course, looks like this: preorder-tree-traversal 1. visit the root of the tree and do what you're going to do with it (print it, collect it, stop if it's what you're looking for, whatever) 2. call preorder-tree-traversal on the left subtree of the root 3. call preorder-tree-traversal on the right subtree of the root If we do a preorder tree traversal on the tree above, and print every root node that we visit in step 1, then we'd see the following printed: abdecfg Can we write some Scheme function to perform a preorder tree traversal on the tree above? Well of course we can! Here's one possibility: (define (print-preorder tree) (cond ((null? tree) ()) (else (write (car tree)) ;print root (print-preorder (cadr tree)) ;trav left (print-preorder (caddr tree)))))) ;trav right If we run our little program, we'd see this > (print-preorder '(a (b (d () ()) (e () ())) (c (f () ()) (g () ())))) abdecfg () > Just like we expected. Or at least it's almost what we expected. Why do you suppose that empty list is displayed below the line where you see "abdecfg"? That's something worth thinking about, no? Yes. (Think about what this function returns as opposed to what it prints.) One thing you should note right away is that we're using "write" in the print-preorder function and printing the nodes as we visit them, as opposed to collecting the nodes into a list. This makes it a little bit easier to see how this works---you don't have to think about how or when to collect nodes into the list. But this also makes us take advantage of something we touched on only briefly before: side effects. Recall that a side effect is something that persists after the function that caused it ceases to execute. Printing is a side effect. The other thing that's happening here is that we're evaluating more than one expression in the action part of a cond clause (the else clause, in this case). It's always been possible, but up until now it hasn't been useful, because as we said before, we haven't been using any side effects. Using side effects takes us out of the purely functional programming paradigm, but there's no way to do printing otherwise, so there it is. Do not consider this little exposition to be permission to go use any and all side effects for whatever reason. For now, we'll tell you when it's ok to use a side effect; don't use them if we don't tell you it's ok. We'll eventually give you free rein, so be patient. And be afraid...be very afraid. Sadly, "print-preorder" isn't is good as it could be. As it stands now, the details of accessing the list structure that represents a binary tree are merged with the high-level algorithm for traversing a binary tree. While it's not real important for a five-line function, if you developed large programs where the data access details were intertwingled (it's a technical term) with the higher-level issues, you'd be one unhappy camper when somebody came along and insisted on changing the data structure specs. Trust me. Just ask anyone who dealt with the Y2K cleanup. Ugh. In other words, we haven't done a very good job of data abstraction here. How do we fix that? Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with the exception of stuff that belongs to somebody else.Last revised: October 2, 2003