CS 2360 - April 18, 1996

Lecture 8 -- Data Abstraction

Page Under Construction


Data abstraction

By this time we should all be pretty familiar with the basic 
ideas behind abstraction.  And the type of abstraction that we've
been dealing with mostly is called procedural abstraction.  That
is, in building our procedures, we've been:

  Postponing worrying about the details
  Decomposition
  Putting as much distance as possible between the high-level
    conceptual ideas of what we're trying to do and the details
    of how it gets done

That last tidbit is pretty important.  What it says is that, in the
procedures we build, we're trying to separate the theory, the design,
the algorithm, etc., from the low-level implementation stuff as
much as we can.

Why is this good for us?  It aids in designability, maintainability,
adaptability, readability, debuggability, and all the other itys.  
But we also know that it's painful.  Why?  Let's face it...we're 
all used to slam-dunking code, and all this abstraction stuff is 
hard to think about if we're not used to it but we're being forced 
to use it.  Ouch!

Nevertheless, the positives outweigh the negatives here, so we 
continue abstracting away.  We get similar benefits when we 
abstract away the details of our data structures.  For example,
already we know that in LISP we can build linked-list data 
structures easily without worrying about the details of how those
structures are implemented.  And that makes our programming live
easier.  Those same simple tasks would be a lot more difficult
in Pascal or C, no?  LISP has done some of the abstraction for us.

So this puts us into another, but related, world of abstraction 
that's called data abstraction.  We want to be able to write our
programs in ways that focus on the high-level concepts about the
data while abstracting away the implementation details of the
data structures.

What else do we gain by employing data abstraction?  We get to
work with pretty pictures, and that actually makes our lives 
easier too.


Graphs -- your basic abstraction

There's an abstraction that computer science types use all the time.
It's called a graph, and you've no doubt seen graphs before.
A graph is just a collection of vertices and edges (or nodes and 
links, or nodes and arcs, or...).

Graphs without cycles have nice mathematical properties, but that's
material for a class on graph theory.  If we want to represent 
information being contained in the vertices, we draw circles at
the vertices and write the information in there.  That's another
abstraction, because that's now how it really looks in memory, but
it makes it easier for us to think about and play with:

And if we put orientations on the edges, we get something called
a directed graph:

This directed graph notation gives us an abstract representation
of a linked list in LISP, like '(a b c), which as you recall is a
previously-defined abstract data type in LISP.


Association lists

There's another data abstraction that's used frequently in LISP.  
It's called the association list, or a-list for short.  It's a 
list of two- (or more) element sublists, typically used as a sort
of lookup table.  Here's a LISP-level abstraction of an a-list:

((eiselt 1.9) (shackelford 2.2) (greenlee 2.0))

And here's a lower-level, box-and-pointer version of the same a-list:


     _______                 _______                 _______ 
    |   |   |               |   |   |               |   |  /|
    | | | --+-------------->| | | --+-------------->| | | / |
    |_|_|___|               |_|_|___|               |_|_|/__|
      |                       |                       |
      |                       |                       |
     \|/                     \|/                     \|/
     _______     _______     _______     _______     _______     _______
    |   |   |   |   |  /|   |   |   |   |   |  /|   |   |   |   |   |  /|
    | | | --+-->| | | / |   | | | --+-->| | | / |   | | | --+-->| | | / |
    |_|_|___|   |_|_|/__|   |_|_|___|   |_|_|/__|   |_|_|___|   |_|_|/__|
      |           |           |           |           |           |
   eiselt        1.9     shackelford     2.2       greenlee      2.0


What helps to make the a-list so predominant in LISP is the existence
of a predefined a-list operation called assoc.  (It's one of the
functions that you defined for yourself in the second homework
assignment but probably forgot to use on your midterm exam.)  The
assoc function takes two arguments, a key and an a-list, and searches
down the a-list for the sublist whose first element matches
the key:

? (assoc 'shackelford '((eiselt 1.9)(shackelford 2.2)(greenlee 2.0)))
(SHACKELFORD 2.2)


Trees

Suppose we wanted to encode the following special kind of a graph
called a tree: 

We might use the following LISP representation if we wanted to encode
only the bottommost (leaf) nodes of the tree:

(((a b)(c d))((e f)(g h)))

If we had a tree and we wanted to encode all the nodes including the
interior (non-leaf) nodes, we might use this LISP representation:

(a (b (d e))(c (f g)))

to encode this tree: 
Another equally good representation for this tree might be:

(a (b (d (nil nil) e (nil nil))) (c (f (nil nil) g (nil nil))))

Can you see that a correctly written program might use functions like
"get-left-subtree" and "get-right-subtree" which could be used to do 
various manipulations or traversals of this tree? These functions 
to get the left and right children could abstract away the details 
of the implementation (which might be one or the other of the above) 
and allow you to write your tree manipulation without concern for 
the implementation. 

Another possible encoding of a tree is an A-list. Here is the above
tree encoded with an A-list:

((a (b c))
 (b (d e))
 (c (f g)))

Should we add (d NIL) to this list? The answer is that it doesn't 
matter as long as we abstract the problem sufficiently.  As long 
as there is some code which implements "get-left-subtree" and
"get-right-subtree" it is not important to the higher layers of
the software whether we put the (d NIL) in the list or not. 
We just deal with the get child functions and depend on them
to do their job. 

So now what happens if you write some big LISP program, one shows tons
of procedural abstraction, but you've written everything that
accessses the tree data structure in such a way that it's dependent on
the nested list implementation of a tree ... and then some bonehead 
(say, your boss) comes along and says, "That tree is too hard to read, I
want you to use the A-List version!"   You get to rewrite *lots*
of code. 

But if you had written the code that treated the tree as a very
high-level, abstract object and postponed the details of the
implementation to a few low-level acessor functions?  You'd win
big time.

You can get some measure of the goodness of your program by evaluating
what proportion of the code that deals with the data structures actually
depends on implementation details: 

This represents the difference between a program that has used
abstraction to effectively postpone the details and one that
doesn't.



Copyright 1996 by Kurt Eiselt.  All rights reserved.  The figures and
some parts of text are stolen directly from Ian Smith's notes for a
previous offering of this same course.  Some of those notes were in
turn taken from my notes for an even earlier offering.  Evolution
at work, no?

Last revised: May 6, 1996