CS 2360 - April 7, 1998

Lecture 3 -- Data Structures and Program Control


LISP acts like an interpreted language.  If you know good old fashioned
BASIC, for example, then you're familiar with interpreted languages.
BASIC offers the user the possibility of typing in a well-formed
expression without a line number, and the BASIC interpreter will
execute that expression immediately.  There's no separate compile-and-
load cycle as with Pascal or C.  LISP provides the same kind of easy
interaction that BASIC did (one of BASIC's few, if not its only,
redeeming quality).  That is, you can fire up the LISP system
(you'll learn how to do that in lab), wait for the prompt,
type in a well-formed LISP expression, and LISP will immediately
evaluate that expression and return the result, which is in turn
displayed on your monitor.

Don't be misled however; most LISP systems today are not interpreters, 
but are in fact incremental compilers which produce much speedier code
than an equivalent interpreter could.  But as you look at it from the
outside, the behavior of a LISP incremental compiler doesn't look much
different than a LISP interpreter.  And either way, once you fire
it up, that LISP evaluator is just sitting there, patiently waiting
for you to type something in so that the LISP system can evaluate
it and return the result.  How does it evaluate what you put in
front of it?  Well, it just depends on the nature of the thing you
put in front of it...that is, it depends on the data type.


LISP's simple data types

LISP comes with its own "abstract data types" or ADTs.  As you undoubtedly 
remember from previous courses, an ADT is (1) the logical data structure 
itself (an abstraction, not the detailed implementation), combined with (2) a 
set of operations which work on the data structure.

The two basic data types in LISP are "atoms" and "lists".  An atom is a 
non-divisible thing, like symbols (foo, bar, +) and numbers (42, 3.15).  
They're not exactly interesting, nor are the operations associated with them.

The more interesting data structure is the list.  A list is an ordered set of 
atoms or lists (a recursive definition, no?).  That ordered set must begin 
with a "(" and end with a ")", and it may contain any number of atoms or lists 
in any order, and therefore includes the empty list.  The following are all 
examples of valid lists:

()
(a)
(a b c)
(a (b) c)
(a (b (c)))
(emily ate the short green crayon)
(defun pos-root (a b c) (/ (pos-numerator a b c)(denominator a b c)))

(Both atoms and lists are often called symbolic expressions, or even just 
expressions, which is why we talked about evaluating expressions occasionally 
in the discussion a few paragraphs above.)

Notice that the very last list up there looks a whole lot like a
function definition.  That's because it IS a function definition.
All function definitions in LISP are lists.  All function invocations
in LISP are lists (and defining a function is nothing more than a
special case of invoking a function, no?).  Which leads us to talk
briefly about one of LISP's special attributes that is shared
with very few other languages: in LISP, there is no distinction
between "program" and "data".  In LISP, you can manipulate the list
that describes how a function is defined with exactly the same
operations that you'd use to manipulate a list containing the names
of all the people you owe money to.  In fact, it's not hard to
write LISP code that generates more LISP code...you can write
programs that write other programs on the fly.  The result isn't
especially easy to debug, but such programs can be useful.

So how does LISP know whether a given list is meant to be viewed
as "data" or as a "program"?  It's all context-dependent.  That is,
it all depends on the context that particular list is embedded in
when the LISP evaluator sees that list.  Here are some simple rules
to help you understand what the LISP evaluator is doing when you type
something at the prompt:

   If you pass a number to the LISP evaluator, the number will
   evaluate to itself.  The evaluator will return that number
   (which will then be displayed on your monitor).

   If you pass a symbol to the LISP evaluator, the symbol will
   evaluate to whatever value the symbol is bound to.  The evaluator 
   will return that value (which will then be displayed on your
   monitor).  If the symbol is unbound, the evaluator will break
   and you'll see an error message.

   If you pass a list to the LISP evaluator, LISP assumes that
   you intend to invoke a function whose name is the first element
   of that list, and whose arguments are the remaining elements in
   that list.  LISP evaluates each of those arguments and retains
   the results.  Then LISP finds the function definition associated
   the function name given as the first element of the list.  Then
   LISP performs the actions given in the function definition on 
   the evaluated arguments and returns the result.  And of course,
   if the first thing in that list really isn't the name of a 
   function, or if the evaluation of the arguments causes a problem,
   then the evaluator will break.

It's really a bit more complex than that, but not too much so.  That
little exposure to the LISP evaluator should suffice for now.  So now
that you have knowledge of the evaluator and LISP's favorite data
structure, the list, what kinds of things can you make the evaluator
do to lists?


Simple operations on lists

There are three fundamental operations on lists: two of them are used for 
decomposing lists and getting at their components, while the other operation 
is used for composing lists.

The first list operator is called "first".  Given an argument which is a list, 
"first" returns the first element of that list.  For example, if the symbol A 
has been bound to the list (X Y Z 4), then (first A) will return X.  Note 
that the original list will not be altered in any way; A is still bound to 
the list (X Y Z 4).

The second list operator is called "rest".  Given an argument which is a list, 
"rest" will return the list consisting of all the original elements of the 
list except the first element.  Thus, assuming A is still bound to (X Y Z 4), 
(rest A) will return (Y Z 4).

The third operator is called "cons", which you can think of as being short for 
"construct".  Given two arguments, where the first is anything and the second 
is a list, "cons" returns the list that you get by inserting the first 
argument as the first element of the list that's the second argument.  
Got it?  So, (cons (first A)(rest A)) returns (X Y Z 4).

Let's go over it again:

What does (first (X Y Z 4)) return?

Time's up.  If you said it returns X, then you're dead wrong.  Why?  
Because you forgot the evaluation rules:

(first A) where A is bound to the list (X Y Z 4) is not the same as 
(first (X Y Z 4))!  

In the first case, the argument A evaluates to the list (X Y Z 4), and then 
when the definition of "first" is applied to that, the first element of that 
list, X, is returned.  Great.  In the second case, however, LISP looks at the 
argument (X Y Z 4) and tries to evaluate that as a call to the function named 
X with the arguments Y, Z, and 4.  If you don't have a previously-defined 
function named X, or if Y or Z aren't bound to something, you'll see LISP 
grind to a screeching halt, as we mentioned above.

In short, anything you throw at LISP, LISP will try to evaluate.  Give it a 
symbol and LISP will try to find what the symbol is bound to.  Give it a list, 
and LISP will try to evaluate it as a function call...


Preventing evaluation

...unless you explicitly tell LISP not to evaluate what you're giving it!  
How do you do that?  With another function, called "quote", which is merely a 
function that takes an argument, doesn't evaluate it, and returns that 
argument.  For example, (quote (X Y Z 4)) returns (X Y Z 4).  So 
(first (quote (X Y Z 4))) returns X, not an error.

The "quote" function is used so often that it gets an abbreviation: the 
apostrophe or single quote mark.  So (quote (X Y Z 4)) is the same thing 
as '(X Y Z 4), and

? (first '(X Y Z 4))
X
? (rest '(X Y Z 4))
(Y Z 4)
? (cons 'X '(Y Z 4))
(X Y Z 4)
?


Box-and-pointer notation

Now you're comfortable, I hope, with the notion that (X Y Z 4) is a list, but 
unlike Pascal and other languages, LISP doesn't require you to worry about 
the pointer details.  But sometimes, LISP folks like to see the pointers to 
help understand what's going on.  So here's a more tangible (i.e., less 
abstract) picture of what our sample list looks like:

A  
|
|
|    _______       _______       _______       _______
+-->|   |   |     |   |   |     |   |   |     |   |  /|     
    | | | --+---->| | | --+---->| | | --+---->| | | / |     
    |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|     
      |             |             |             |
      X             Y             Z             4

In this lovely figure, each element of the list is represented as a pair of 
words in memory.  The first or left word contains a pointer to the symbol 
that "is" the first element, and the second or right word contains a pointer 
to the next element.  It's now easy to see that, given a pointer to this list, 
the function "first" (e.g., (first A)) looks at the pair of words at the end 
of that pointer, follows the pointer stored in the left word, and returns 
what's at the end of that pointer.  In this case, it's the symbol X.  
The "rest" function (e.g., (rest A)) looks at the pair of words pointed at by 
A, follows the pointer stored in the right word, and returns what's at the 
end of that pointer, which in this case is the list (Y Z 4).  Note again that 
these operations didn't change the structure of the list, they merely followed 
pointers and returned what they pointed to.

Each of these two-word pairs are called "cons cells", because it's what you 
get when you "cons" two things together.  (This is where the dynamic memory 
allocation mechanism comes into play.)

Earlier, I said that the second argument to the "cons" function should be a 
list, but I lied for the sake of simplicity.  You can in fact have something 
that's not a list as the second argument to a "cons" function, but the result 
is something slightly weird, called a "dotted pair".  For example:

? (cons 'A 'B)
(A . B)
? 

The box-and-pointer notation for this dotted pair looks like this:

     _______      
    |   |   |     
    | | | --+---->B
    |_|_|___|     
      |           
      A           

You used to see dotted pairs used a lot in LISP programming, but they're not 
used as frequently any more (except in some circumstances, which we might get 
to in this course).  I personally interpret these as meaning that I "cons'ed" 
two things together that I didn't mean to.  And that happens more often
than I like to admit.


A dangerous piece of knowledge

Lots of LISP programming errors result from using "quote" when not necessary, 
or not using it when you should, so you might want to sit down for a while in 
front of your favorite LISP system and play with the stuff from today's
lecture for awhile.  In fact, in general, it's a good idea to go to your
favorite LISP system as soon as you can after lecture and work through
any examples and expand on them.  The practice will be good for you.
And in doing this kind of practice, it might be helpful to know how to bind 
symbols to values using an assignment operator.  So now I'm going to show you 
how to do that, but for now you can only use assignment to help you get a feel 
for quoting and evaluation and stuff like that.  DO NOT USE THIS ASSIGNMENT 
OPERATOR IN ANYTHING YOU SUBMIT FOR GRADING UNTIL WE TELL YOU TO AS IT 
VIOLATES FUNCTIONAL PROGRAMMING CONSTRAINTS.  Someday in the weeks ahead, 
we'll let you use assignment in a responsible fashion, but for now just use 
it in practice.  Otherwise, you'll find yourself getting amazingly low 
grades while doing extra work.  Remember: friends don't let friends use
assignment.

The generic assignment operator is "setq", and takes two arguments.  The first 
argument is a symbol (e.g., a variable name), and the second argument is the 
thing you want the symbol bound to.  The first argument is not evaluated, but 
the second argument is, which should tell you that "setq" is not an ordinary 
LISP function.  The "setq" function evaluates the second argument, binds it 
to the first argument, and returns the evaluated second argument:

? (setq A '(X Y Z 4))
(X Y Z 4)
? (first A)
X
?

? (setq A (X Y Z 4))
> Error: Unbound variable: Y
> While executing: SYMBOL-VALUE
> Type Command-/ to continue, Command-. to abort.
> If continued: Retry getting the value of Y.
See the Restarts... menu item for further choices.
1 > 

A variation of "setq" is "setf".  They're different, but it's not important 
for you to know how they're different just yet.  We'll deal with that later.  
For now, stay entirely away from setf and trust that we'll explain why
in the weeks to come.


Something for nothing

What's the technical difference between the list and the dotted pair?  It's 
how they are terminated.  A "well-formed" list always ends with something 
called "nil", which we represented in the box-and-pointer diagram last time 
with a big slash through the right word of the last element.  A dotted pair, 
on the other hand, ends with something other than "nil".  

So what's a nil?  A nil is a very special thing in LISP.  First, it has the 
unusual property of being both an atom and a list at the same time.  What 
kind of list?  It's the empty list, represented by "()".  Consequently, "nil" 
and "()" are the same thing.  It's also the symbol meaning Boolean "false" in 
LISP (Boolean "true" is the predefined "T" or anything non-nil).  And, as we 
noted just above, it's always the end of a "well-formed" list, which allows 
LISP to maintain the following sorts of consistencies:  

The first element of the empty list is, naturally, nothing, which is nil, 
which is the empty list:

? (first nil)
NIL
?

The rest of the empty list, which is just the empty list with the first 
element (i.e., nothing) removed, is also the empty list:

? (rest nil)
NIL
? 

But if we try to put (first nil) and (rest nil) back together using "cons", 
we should get the original thing we started with, which was nil, no?  No:

? (cons (first nil) (rest nil))
(NIL)
?

It's a result of the unique nature of nil in LISP.  Think of it as one of 
LISP's endearing idiosyncracies.  


The conditional

A programming language really isn't worth much if there's no 
way to change the flow of program control.  Being able to 
take different branches depending on the results of some test 
is what makes computer programs useful.  Without conditionals,
computing in general would be really boring.  The basic mechanism 
for doing this is called the "conditional", and in LISP the 
fundamental conditional is called "cond".  Here's the syntax:

(cond (*test1* *action1*)
      (*test2* *action2*)
          :
          :
      (*testN* *actionN*))

If the expression *test1* evaluates to non-nil, then the 
"cond" function returns what the expression *action1* 
evaluates to.  (Since *test1* is an expression, we'd expect 
to see a function call there, or maybe a symbol bound to some 
value...that sort of thing.)  If *test1* evaluates to nil, 
the "cond" skips to *test2*, which is evaluated as above, 
and so on.  Each test-action pair is called a "cond clause".

If all the tests are evaluated in sequence, and all tests 
evaluate to nil, then the "cond" returns nil.  While you can 
count on this to happen, it may not be immediately obvious to 
other folks who read the "cond" expression exactly what the 
original programmer intended to occur in this case.  Good 
programming style in general demands that you make your 
intentions explicit in your code.  Here, that means you 
should always end your "cond" with a cond clause which makes 
it obvious what you expect to happen when all the previous 
tests evaluate to nil.  You do it like this:

(cond (*test1* *action1*)
      (*test2* *action2*)
          :
          :
      (*testN* *actionN*)
      (T *what you want to happen if all else fails*))

Also, you can have more than one action in each cond clause.  
If the test is non-nil, the associated actions will be 
evaluated left-to-right, and the last expression evaluated 
will be the one returned by the "cond" function.  (Note, 
though, that since we're not letting you create any side-
effects by assigning values to variables yet, this feature 
won't be all that useful to you just now.)  What kinds of 
tests already exist for you to use?  Here's a quick lesson:


Predicates

Common LISP provides a set of functions which are designed to 
execute useful tests and Boolean or true/false values 
depending on the outcome of the test.  These are called 
predicates, and we use them all the time as the tests in our 
"cond" functions.  Here are some commonly-used predicates:

  (null *expr*)             returns non-nil if *expr* is the empty
                            list, nil if *expr* is not empty

  (equal *expr1* *expr2*)   returns non-nil if *expr1* and *expr2*
                            evaluate to equivalent data structures
                             (i.e.,they look the same), nil otherwise

With these two predicates in hand and knowledge of the almighty
conditional, you have the potential to write functions that do 
substantially more than multiply three numbers together.  For example....


Using "cond" -- an example

Let's say we want to define a function which tells us if a 
given item is an element of a given list.  This turns out to 
be a very useful function, and it already exists in Common 
LISP.  It's called "member".  But even though it already 
exists, we want the practice, so we're going to construct our 
own version.  And to make sure we don't inadvertently replace 
LISP's version with our own possibly buggy version, we'll 
give ours a distinctive name.  Following a tradition handed
down through generations of programming courses, we'll use
the convention of creating these distinctive names by taking
the name of the LISP function we're trying to mimic and adding
the prefix "my-" to it.  Thus we generate the name "my-member"
for our own version of "member".

What will the design look like?  We can sketch it out with a 
combination of the LISP syntax we already know, and some 
English where we're not sure about the LISP yet.  Here's the 
first cut:

  (defun my-member (input-item input-list)
         if done then return "no"
    else if input-item = first element of input-list
         then return "yes"
    else what?  see if input-item = next thing on input-list?
         how? )

OK, so how are we going to turn all that "if-then-else" stuff into a "cond"?

  (defun my-member (input-item input-list)
    (cond (done then return "no")
          (input-item = first element of input-list
           then return "yes")
          (what?  see if input-item = next thing on 
           input-list? how? ) ) )

Hmmm.  That looks a little more like LISP, but it sure won't 
run on my Macintosh.  What looks like something that's going 
to be real easy to turn into LISP?  How about that test to 
see if input-item is the same as the first element of input-
list?  That should be easy.  Just remember the "cond" syntax:

  (defun my-member (input-item input-list)
    (cond (done then return "no")
          ((equal input-item (first input-list))
           then return "yes")
          (what?  see if input-item = next thing on 
           input-list? how? ) ) )

And how do we return "yes" in that case?

  (defun my-member (input-item input-list)
    (cond (done then return "no")
          ((equal input-item (first input-list)) T)
          (what?  see if input-item = next thing on 
           input-list? how? ) ) )

Nothing to it.  How are we going to test if we're done?  
Well, if we just sort of walk along input-list, testing the 
individual elements to see if they match input-item, what 
would be the termination point?  When we run out of input-
list, or, in other words, when input-list is nil.  So now we 
can translate more English into LISP:

  (defun my-member (input-item input-list)
    (cond ((null input-list) nil)
          ((equal input-item (first input-list)) T)
          (what?  see if input-item = next thing on 
           input-list? how? ) ) )

Wow.  Now I have more LISP than English.  But there's still 
one missing chunk.  How do I get this thing to repeat for 
every element of input-list (or at least until I match input-
item)?  If we were piddling around with Pascal or C, we'd want 
to create some sort of loop structure, and maybe create a 
variable or two, and throw in an assignment operation here 
and there...make it really complicated, and in the process 
make ourselves feel good about how much mastery we have over 
our computer.  Grrrrr.

Well, that's not gonna happen here.  Not today at least.  
We're going to use a very elegant and computationally pure 
form of iteration which LISP supports very nicely.  It's 
called recursion...you may have heard of it before.

  (defun my-member (input-item input-list)
    (cond ((null input-list) nil)
          ((equal input-item (first input-list)) T)
          (T (my-member input-item (rest input-list)))))

It's done.  It doesn't work exactly like the official version of
"member" that's already defined in LISP; we'll talk about that on
Thursday.  But the function above does what we set out to do, and
you have to admit it was pretty darn easy to make it work.  In fact
it's so easy that many of you were telling me how to write this code
in class, and along the way you introduced the concept of recursion
in LISP without me having to prompt you (much).  We'll talk about
recursion a lot in the next couple of lectures, and you'll use it
a lot in the code you write.  But if you ever get weirded out by
recursion, stop and think about this example, and remember that 
it's such a simple concept that you introduced it in class before
I did.  Really.



Copyright 1998 by Kurt Eiselt.  All rights reserved.

Last revised: April 7, 1998