Now we turn and wave goodbye to that grunt-level implementation
stuff and move on to questions about knowledge and how we
represent it.
Data abstraction
By this time we should all be pretty familiar with the basic
ideas behind abstraction. And the type of abstraction that we've
been dealing with mostly is called procedural abstraction. That
is, in building our procedures, we've been:
Postponing worrying about the details
Decomposition
Putting as much distance as possible between the high-level
conceptual ideas of what we're trying to do and the details
of how it gets done
That last tidbit is pretty important. What it says is that, in the
procedures we build, we're trying to separate the theory, the design,
the algorithm, etc., from the low-level implementation stuff as
much as we can.
Why is this good for us? It aids in designability, maintainability,
adaptability, readability, debuggability, and all the other itys.
But we also know that it's painful. Why? Let's face it...we're
all used to slam-dunking code, and all this abstraction stuff is
hard to think about if we're not used to it but we're being forced
to use it. Ouch!
Nevertheless, the positives outweigh the negatives here, so we
continue abstracting away. We get similar benefits when we
abstract away the details of our data structures. For example,
already we know that in LISP we can build linked-list data
structures easily without worrying about the details of how those
structures are implemented. And that makes our programming lives
easier. Those same simple tasks would be a lot more difficult
in Pascal or C, no? LISP has done some of the abstraction for us.
So this puts us into another, but related, world of abstraction
that's called data abstraction. We want to be able to write our
programs in ways that focus on the high-level concepts about the
data while abstracting away the implementation details of the
data structures.
What else do we gain by employing data abstraction? We get to
work with pretty pictures, and that actually makes our lives
easier too. Let's look at some forms of abstracted data....
Graphs -- your basic abstraction
There's an abstraction that computer science types use all the time.
It's called a graph, and you've no doubt seen graphs before.
A graph is just a collection of vertices and edges (or nodes and
links, or nodes and arcs, or...).
Graphs without cycles have nice mathematical properties, but that's material for a class on graph theory. If we want to represent information being contained in the vertices, we draw circles at the vertices and write the information in there. That's another abstraction, because that's now how it really looks in memory, but it makes it easier for us to think about and play with:
And if we put orientations on the edges, we get something called a directed graph:
This directed graph notation gives us an abstract representation
of a linked list in LISP, like '(a b c), which as you recall is a
previously-defined abstract data type in LISP.
Association lists
There's another data abstraction that's used frequently in LISP.
It's called the association list, or a-list for short. It's a
list of two- (or more) element sublists, typically used as a sort
of lookup table. Here's a LISP-level abstraction of an a-list:
((eiselt 1.9) (shackelford 2.7) (greenlee 0.6))
And here's a lower-level, box-and-pointer version of the same a-list:
_______ _______ _______
| | | | | | | | /|
| | | --+-------------->| | | --+-------------->| | | / |
|_|_|___| |_|_|___| |_|_|/__|
| | |
| | |
\|/ \|/ \|/
_______ _______ _______ _______ _______ _______
| | | | | /| | | | | | /| | | | | | /|
| | | --+-->| | | / | | | | --+-->| | | / | | | | --+-->| | | / |
|_|_|___| |_|_|/__| |_|_|___| |_|_|/__| |_|_|___| |_|_|/__|
| | | | | |
eiselt 1.9 shackelford 2.7 greenlee 0.6
What helps to make the a-list so predominant in LISP is the existence
of a predefined a-list operation called assoc. (It's one of the
functions that you defined for yourself in the second homework
assignment) The assoc function takes two arguments, a key and an
a-list, and searches down the a-list for the sublist whose first
element matches the key:
? (assoc 'shackelford '((eiselt 1.9)(shackelford 2.7)(greenlee 0.6)))
(SHACKELFORD 2.7)
Trees
Suppose we wanted to encode the following special kind of a graph
called a tree:
We might use the following LISP representation if we wanted to encode only the bottommost (leaf) nodes of the tree: (((a b)(c d))((e f)(g h))) If we had a tree and we wanted to encode all the nodes including the interior (non-leaf) nodes, we might use this LISP representation: (a (b (d e))(c (f g))) to encode this tree:
Another equally good representation for this tree might be:
(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil)))
If I asked you how to do a preorder traversal of this tree,
you might say "flatten!" You saw the "flatten" function a couple of
lectures ago. Flattening the list above would give you the list:
(a b d e c f g)
which when read from left to right could be interpreted as the order
in which the nodes of the tree above are visited during a preorder
tree traversal.
But while "flatten" does perform a tree traversal of sorts on the
list, it looks like preorder only because the nodes in the original
list are arranged in such a way that when you read them left to right
it's a preorder traversal. If I changed the structure of the
original list so that a left-to-right reading of the atoms looked
like a postorder traversal, then "flatten" might fool you into
believing it was doing a postorder traversal:
? (flatten '(((nil nil d) (nil nil e) b) ((nil nil f) (nil nil g) c) a))
(D E B F G C A)
So "flatten" doesn't do preorder or postorder traversal, it just
does its "left-to-right" traversal on every list given to it and
doesn't really take advantage of any helpful characteristics of
the list itself (e.g., it's a binary tree, a binary tree is
represented as a three-element list, the first element is the root,
the second element is the left subtree, the third element is the
right subtree, etc.).
We could write an "honest" preorder tree traversal function which
does take advantage of the "treeness" of the input list. Here's
one possibility:
(defun print-preorder (tree)
(cond ((null tree) nil)
(t (print (first tree)) ;deal with root
(print-preorder (second tree)) ;recurse on left subtree
(print-preorder (third tree))))) ;recurse on right subtree
One thing you should note right away is that we're using "print"
here and printing the nodes as we visit them, as opposed to
collecting the nodes into a list. This makes it a little bit
easier to see how this works---you don't have to think about
how or when to collect nodes into the list. But again, this
isn't a license to go start using side effects in what you
submit for grading. That time is coming. Be patient.
(And be afraid...be very afraid.)
The other, more important thing you should note is that you can
now see explicitly in the code where the root is handled, where
the left-subtree is handled, and where the right-subtree is
handled. This program does the job we want, printing the nodes
on the screen in the order of traversal:
? (print-preorder '(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil))))
A
B
D
E
C
F
G
NIL
But "print-preorder" isn't is good as it could be. As it stands now,
the details of accessing the list structure that represents a binary
tree are merged with the high-level algorithm for traversing a
binary tree. While it's not real important for a five-line function,
if you developed large programs where the data access details were
intertwingled (it's a technical term) with the higher-level issues,
you'd be one unhappy camper when somebody came along and insisted
on changing the data structure specs. Trust me. Just ask anyone
who's dealing with the Year 2000 problem (or better yet, any of
us who caused it...as you now know). Ugh.
So what do you do to improve "print-preorder"? You could write the
appropriate functions, maybe called something like "get-root",
"get-left-subtree", and "get-right-subtree", which could be used by
other functions to carry out various manipulations or traversals of
this tree. These functions to get the left and right children could
abstract away the details of the implementation and allow you to
write your tree manipulation without concern for the implementation
of the data structure:
(defun best-print-preorder (tree)
(cond ((null (get-root tree)) nil)
(t (print (get-root tree))
(best-print-preorder (get-left-subtree tree))
(best-print-preorder (get-right-subtree tree)))))
(defun get-root (tree)
(first tree))
(defun get-left-subtree (tree)
(second tree))
(defun get-right-subtree (tree)
(third tree))
Now we've separated the algorithm from the implementation details.
If someone comes along and says that the implementation of a binary
tree has changed from (root left-subtree right-subtree) to
(left-subtree right-subtree root) for some dorky reason, then
you know what needs to be changed and what doesn't---the algorithm
remains untouched, but those itty bitty accessor functions need
some fixing:
(defun print-preorder (tree)
(cond ((null (get-root tree)) nil)
(t (print (get-root tree))
(print-preorder (get-left-subtree tree))
(print-preorder (get-right-subtree tree)))))
(defun get-root (tree)
(third tree))
(defun get-left-subtree (tree)
(first tree))
(defun get-right-subtree (tree)
(second tree))
? (print-preorder '(((nil nil d) (nil nil e) b) ((nil nil f) (nil nil g) c) a))
A
B
D
E
C
F
G
NIL
Similarly, if we want our function to do an inorder or postorder
traversal, we leave the low-level accessor functions unaltered
and tweak the higher-level algorithm:
(defun print-inorder (tree)
(cond ((null (get-root tree)) nil)
(t (print-inorder (get-left-subtree tree))
(print (get-root tree))
(print-inorder (get-right-subtree tree)))))
(defun get-root (tree)
(first tree))
(defun get-left-subtree (tree)
(second tree))
(defun get-right-subtree (tree)
(third tree))
? (print-inorder '(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil))))
D
B
E
A
F
C
G
NIL
(defun print-postorder (tree)
(cond ((null (get-root tree)) nil)
(t (print-postorder (get-left-subtree tree))
(print-postorder (get-right-subtree tree))
(print (get-root tree)))))
(defun get-root (tree)
(first tree))
(defun get-left-subtree (tree)
(second tree))
(defun get-right-subtree (tree)
(third tree))
? (print-postorder '(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil))))
D
E
B
F
G
C
A
A
Data Abstraction Revisited
The exercise we just went through, that of using procedural abstraction
to put some distance between high-level algorithms and low-level
data structure implementation details, is a form of "data abstraction".
The general idea is the same as with procedural abstraction: postpone
worrying about the details.
Data abstraction is something you want to be practicing regularly. It
takes a little bit of thinking at first (what doesn't in 2360?), but
it pays of big in the long run. Remember what we said a few weeks ago:
the bulk of time and energy invested in the life of a major piece of
software is expended after the initial design and implementation
in efforts to debug, maintain, improve, and adapt the software.
Putting an "abstraction wall" between the data accessing functions
and everything else confines code changes brought on by changes
in data structures to relatively few procedures, and it makes the
whole thing tremendously easier to read. If you get in the habit
of doing this now and carrying this practice through your professional
career, you will acquire the reputation of being a seriously impressive
software developer (and rightfully so).
You can get some measure of the goodness of your program by evaluating
what proportion of the code that deals with the data structures actually
depends on implementation details:
This represents the difference between a program that has used abstraction to effectively postpone the details and one that doesn't. Strive for the former, shun the latter. Copyright 1998 by Kurt Eiselt. All rights reserved. The figures and some parts of text are stolen directly from Ian Smith's notes for an earlier offering of this same course. Some of those notes were in turn taken from my notes for an even earlier offering.
Last revised: October 24, 1998