CS 2360 - October 22, 1998

Lecture 9 -- Data Abstraction


Now we turn and wave goodbye to that grunt-level implementation
stuff and move on to questions about knowledge and how we
represent it.  


Data abstraction

By this time we should all be pretty familiar with the basic 
ideas behind abstraction.  And the type of abstraction that we've
been dealing with mostly is called procedural abstraction.  That
is, in building our procedures, we've been:

  Postponing worrying about the details
  Decomposition
  Putting as much distance as possible between the high-level
    conceptual ideas of what we're trying to do and the details
    of how it gets done

That last tidbit is pretty important.  What it says is that, in the
procedures we build, we're trying to separate the theory, the design,
the algorithm, etc., from the low-level implementation stuff as
much as we can.

Why is this good for us?  It aids in designability, maintainability,
adaptability, readability, debuggability, and all the other itys.  
But we also know that it's painful.  Why?  Let's face it...we're 
all used to slam-dunking code, and all this abstraction stuff is 
hard to think about if we're not used to it but we're being forced 
to use it.  Ouch!

Nevertheless, the positives outweigh the negatives here, so we 
continue abstracting away.  We get similar benefits when we 
abstract away the details of our data structures.  For example,
already we know that in LISP we can build linked-list data 
structures easily without worrying about the details of how those
structures are implemented.  And that makes our programming lives
easier.  Those same simple tasks would be a lot more difficult
in Pascal or C, no?  LISP has done some of the abstraction for us.

So this puts us into another, but related, world of abstraction 
that's called data abstraction.  We want to be able to write our
programs in ways that focus on the high-level concepts about the
data while abstracting away the implementation details of the
data structures.

What else do we gain by employing data abstraction?  We get to
work with pretty pictures, and that actually makes our lives 
easier too.  Let's look at some forms of abstracted data....


Graphs -- your basic abstraction

There's an abstraction that computer science types use all the time.
It's called a graph, and you've no doubt seen graphs before.
A graph is just a collection of vertices and edges (or nodes and 
links, or nodes and arcs, or...).

Graphs without cycles have nice mathematical properties, but that's
material for a class on graph theory.  If we want to represent 
information being contained in the vertices, we draw circles at
the vertices and write the information in there.  That's another
abstraction, because that's now how it really looks in memory, but
it makes it easier for us to think about and play with:

And if we put orientations on the edges, we get something called
a directed graph:

This directed graph notation gives us an abstract representation
of a linked list in LISP, like '(a b c), which as you recall is a
previously-defined abstract data type in LISP.


Association lists

There's another data abstraction that's used frequently in LISP.  
It's called the association list, or a-list for short.  It's a 
list of two- (or more) element sublists, typically used as a sort
of lookup table.  Here's a LISP-level abstraction of an a-list:

((eiselt 1.9) (shackelford 2.7) (greenlee 0.6))

And here's a lower-level, box-and-pointer version of the same a-list:


     _______                 _______                 _______ 
    |   |   |               |   |   |               |   |  /|
    | | | --+-------------->| | | --+-------------->| | | / |
    |_|_|___|               |_|_|___|               |_|_|/__|
      |                       |                       |
      |                       |                       |
     \|/                     \|/                     \|/
     _______     _______     _______     _______     _______     _______
    |   |   |   |   |  /|   |   |   |   |   |  /|   |   |   |   |   |  /|
    | | | --+-->| | | / |   | | | --+-->| | | / |   | | | --+-->| | | / |
    |_|_|___|   |_|_|/__|   |_|_|___|   |_|_|/__|   |_|_|___|   |_|_|/__|
      |           |           |           |           |           |
   eiselt        1.9     shackelford     2.7       greenlee      0.6


What helps to make the a-list so predominant in LISP is the existence
of a predefined a-list operation called assoc.  (It's one of the
functions that you defined for yourself in the second homework
assignment)  The assoc function takes two arguments, a key and an 
a-list, and searches down the a-list for the sublist whose first 
element matches the key:

? (assoc 'shackelford '((eiselt 1.9)(shackelford 2.7)(greenlee 0.6)))
(SHACKELFORD 2.7)


Trees

Suppose we wanted to encode the following special kind of a graph
called a tree: 

We might use the following LISP representation if we wanted to encode
only the bottommost (leaf) nodes of the tree:

(((a b)(c d))((e f)(g h)))

If we had a tree and we wanted to encode all the nodes including the
interior (non-leaf) nodes, we might use this LISP representation:

(a (b (d e))(c (f g)))

to encode this tree: 
Another equally good representation for this tree might be:

(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil)))

If I asked you how to do a preorder traversal of this tree,
you might say "flatten!"  You saw the "flatten" function a couple of 
lectures ago.  Flattening the list above would give you the list:

(a b d e c f g)

which when read from left to right could be interpreted as the order
in which the nodes of the tree above are visited during a preorder
tree traversal.

But while "flatten" does perform a tree traversal of sorts on the 
list, it looks like preorder only because the nodes in the original 
list are arranged in such a way that when you read them left to right 
it's a preorder traversal.  If I changed the structure of the
original list so that a left-to-right reading of the atoms looked
like a postorder traversal, then "flatten" might fool you into
believing it was doing a postorder traversal:

? (flatten '(((nil nil d) (nil nil e) b) ((nil nil f) (nil nil g) c) a))
(D E B F G C A)

So "flatten" doesn't do preorder or postorder traversal, it just 
does its "left-to-right" traversal on every list given to it and
doesn't really take advantage of any helpful characteristics of
the list itself (e.g., it's a binary tree, a binary tree is 
represented as a three-element list, the first element is the root,
the second element is the left subtree, the third element is the
right subtree, etc.).

We could write an "honest" preorder tree traversal function which
does take advantage of the "treeness" of the input list.  Here's
one possibility:

(defun print-preorder (tree)
  (cond ((null tree) nil)
        (t (print (first tree))             ;deal with root
           (print-preorder (second tree))   ;recurse on left subtree
           (print-preorder (third tree))))) ;recurse on right subtree

One thing you should note right away is that we're using "print"
here and printing the nodes as we visit them, as opposed to
collecting the nodes into a list.  This makes it a little bit
easier to see how this works---you don't have to think about
how or when to collect nodes into the list.  But again, this
isn't a license to go start using side effects in what you
submit for grading.  That time is coming.  Be patient.
(And be afraid...be very afraid.)

The other, more important thing you should note is that you can
now see explicitly in the code where the root is handled, where
the left-subtree is handled, and where the right-subtree is 
handled.  This program does the job we want, printing the nodes
on the screen in the order of traversal:

? (print-preorder '(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil))))

A 
B 
D 
E 
C 
F 
G 
NIL

But "print-preorder" isn't is good as it could be.  As it stands now,
the details of accessing the list structure that represents a binary
tree are merged with the high-level algorithm for traversing a
binary tree.  While it's not real important for a five-line function,
if you developed large programs where the data access details were
intertwingled (it's a technical term) with the higher-level issues,
you'd be one unhappy camper when somebody came along and insisted
on changing the data structure specs.  Trust me.  Just ask anyone
who's dealing with the Year 2000 problem (or better yet, any of
us who caused it...as you now know).  Ugh.

So what do you do to improve "print-preorder"?  You could write the 
appropriate functions, maybe called something like "get-root",
"get-left-subtree", and "get-right-subtree", which could be used by
other functions to carry out various manipulations or traversals of 
this tree.  These functions to get the left and right children could 
abstract away the details of the implementation and allow you to 
write your tree manipulation without concern for the implementation
of the data structure:

(defun best-print-preorder (tree)
  (cond ((null (get-root tree)) nil)
        (t (print (get-root tree))
           (best-print-preorder (get-left-subtree tree))
           (best-print-preorder (get-right-subtree tree)))))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

Now we've separated the algorithm from the implementation details.
If someone comes along and says that the implementation of a binary
tree has changed from (root left-subtree right-subtree) to
(left-subtree right-subtree root) for some dorky reason, then
you know what needs to be changed and what doesn't---the algorithm
remains untouched, but those itty bitty accessor functions need
some fixing:

(defun print-preorder (tree)
  (cond ((null (get-root tree)) nil)
        (t (print (get-root tree))
           (print-preorder (get-left-subtree tree))
           (print-preorder (get-right-subtree tree)))))

(defun get-root (tree)
  (third tree))

(defun get-left-subtree (tree)
  (first tree))

(defun get-right-subtree (tree)
  (second tree))

? (print-preorder '(((nil nil d) (nil nil e) b) ((nil nil f) (nil nil g) c) a))

A 
B 
D 
E 
C 
F 
G 
NIL

Similarly, if we want our function to do an inorder or postorder
traversal, we leave the low-level accessor functions unaltered
and tweak the higher-level algorithm:

(defun print-inorder (tree)
  (cond ((null (get-root tree)) nil)
        (t (print-inorder (get-left-subtree tree))
           (print (get-root tree))
           (print-inorder (get-right-subtree tree)))))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

? (print-inorder '(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil))))

D 
B 
E 
A 
F 
C 
G 
NIL



(defun print-postorder (tree)
  (cond ((null (get-root tree)) nil)
        (t (print-postorder (get-left-subtree tree))
           (print-postorder (get-right-subtree tree))
           (print (get-root tree)))))

(defun get-root (tree)
  (first tree))

(defun get-left-subtree (tree)
  (second tree))

(defun get-right-subtree (tree)
  (third tree))

? (print-postorder '(a (b (d nil nil) (e nil nil)) (c (f nil nil) (g nil nil))))

D 
E 
B 
F 
G 
C 
A 
A


Data Abstraction Revisited

The exercise we just went through, that of using procedural abstraction
to put some distance between high-level algorithms and low-level 
data structure implementation details, is a form of "data abstraction".
The general idea is the same as with procedural abstraction: postpone
worrying about the details.

Data abstraction is something you want to be practicing regularly.  It
takes a little bit of thinking at first (what doesn't in 2360?), but
it pays of big in the long run.  Remember what we said a few weeks ago:
the bulk of time and energy invested in the life of a major piece of
software is expended after the initial design and implementation
in efforts to debug, maintain, improve, and adapt the software.
Putting an "abstraction wall" between the data accessing functions
and everything else confines code changes brought on by changes
in data structures to relatively few procedures, and it makes the
whole thing tremendously easier to read.  If you get in the habit
of doing this now and carrying this practice through your professional
career, you will acquire the reputation of being a seriously impressive
software developer (and rightfully so).

You can get some measure of the goodness of your program by evaluating
what proportion of the code that deals with the data structures actually
depends on implementation details: 

This represents the difference between a program that has used
abstraction to effectively postpone the details and one that
doesn't.  Strive for the former, shun the latter.



Copyright 1998 by Kurt Eiselt.  All rights reserved.  The figures and
some parts of text are stolen directly from Ian Smith's notes for an
earlier offering of this same course.  Some of those notes were in
turn taken from my notes for an even earlier offering.  

Last revised: October 24, 1998