LISP's favorite data structure
LISP comes with its own "abstract data types" or ADTs. As you undoubtedly
remember from previous courses, an ADT is (1) the logical data structure
itself (an abstraction, not the detailed implementation), combined with (2) a
set of operations which work on the data structure.
The two basic data types in LISP are "atoms" and "lists". An atom is a
non-divisible thing, like symbols (foo, bar, +) and numbers (42, 3.15).
They're not exactly interesting, nor are the operations associated with them.
The more interesting data structure is the list. A list is an ordered set of
atoms or lists (a recursive definition, no?). That ordered set must begin
with a "(" and end with a ")", and it may contain any number of atoms or lists
in any order, and therefore includes the empty list. The following are all
examples of valid lists:
()
(a)
(a b c)
(a (b) c)
(a (b (c)))
(emily ate the short green crayon)
(defun pos-root (a b c) (/ (pos-numerator a b c)(denominator a b c)))
(Both atoms and lists are often called symbolic expressions, or even just
expressions, which is why we talked about evaluating expressions occasionally
in the discussion a few paragraphs above.)
Simple operations on lists
There are three fundamental operations on lists: two of them are used for
decomposing lists and getting at their components, while the other operation
is used for composing lists.
The first list operator is called "first". Given an argument which is a list,
"first" returns the first element of that list. For example, if the symbol A
has been bound to the list (X Y Z 4), then (first A) will return X. Note
that the original list will not be altered in any way; A is still bound to
the list (X Y Z 4).
The second list operator is called "rest". Given an argument which is a list,
"rest" will return the list consisting of all the original elements of the
list except the first element. Thus, assuming A is still bound to (X Y Z 4),
(rest A) will return (Y Z 4).
The third operator is called "cons", which you can think of as being short for
"construct". Given two arguments, where the first is anything and the second
is a list, "cons" returns the list that you get by inserting the first
argument as the first element of the list that's the second argument.
Got it? So, (cons (first A)(rest A)) returns (X Y Z 4).
Let's go over it again:
What does (first (X Y Z 4)) return?
Time's up. If you said it returns X, then you're dead wrong. Why?
Because you forgot the evaluation rules:
(first A) where A is bound to the list (X Y Z 4) is not the same as
(first (X Y Z 4))!
In the first case, the argument A evaluates to the list (X Y Z 4), and then
when the definition of "first" is applied to that, the first element of that
list, X, is returned. Great. In the second case, however, LISP looks at the
argument (X Y Z 4) and tries to evaluate that as a call to the function named
X with the arguments Y, Z, and 4. If you don't have a previously-defined
function named X, or if Y or Z aren't bound to something, you'll see LISP
grind to a screeching halt.
In short, anything you throw at LISP, LISP will try to evaluate. Give it a
symbol and LISP will try to find what the symbol is bound to. Give it a list,
and LISP will try to evaluate it as a function call...
Preventing evaluation
...unless you explicitly tell LISP not to evaluate what you're giving it!
How do you do that? With another function, called "quote", which is merely a
function that takes an argument, doesn't evaluate it, and returns that
argument. For example, (quote (X Y Z 4)) returns (X Y Z 4). So
(first (quote (X Y Z 4))) returns X, not an error.
The "quote" function is used so often that it gets an abbreviation: the
apostrophe or single quote mark. So (quote (X Y Z 4)) is the same thing
as '(X Y Z 4), and
? (first '(X Y Z 4))
X
? (rest '(X Y Z 4))
(Y Z 4)
? (cons 'X '(Y Z 4))
(X Y Z 4)
?
Box-and-pointer notation
Now you're comfortable, I hope, with the notion that (X Y Z 4) is a list, but
unlike Pascal and other languages, LISP doesn't require you to worry about
the pointer details. But sometimes, LISP folks like to see the pointers to
help understand what's going on. So here's a more tangible (i.e., less
abstract) picture of what our sample list looks like:
A
|
|
| _______ _______ _______ _______
+-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
X Y Z 4
In this lovely figure, each element of the list is represented as a pair of
words in memory. The first or left word contains a pointer to the symbol
that "is" the first element, and the second or right word contains a pointer
to the next element. It's now easy to see that, given a pointer to this list,
the function "first" (e.g., (first A)) looks at the pair of words at the end
of that pointer, follows the pointer stored in the left word, and returns
what's at the end of that pointer. In this case, it's the symbol X.
The "rest" function (e.g., (rest A)) looks at the pair of words pointed at by
A, follows the pointer stored in the right word, and returns what's at the
end of that pointer, which in this case is the list (Y Z 4). Note again that
these operations didn't change the structure of the list, they merely followed
pointers and returned what they pointed to.
Each of these two-word pairs are called "cons cells", because it's what you
get when you "cons" two things together. (This is where the dynamic memory
allocation mechanism comes into play.)
Earlier, I said that the second argument to the "cons" function should be a
list, but I lied for the sake of simplicity. You can in fact have something
that's not a list as the second argument to a "cons" function, but the result
is something slightly weird, called a "dotted pair". For example:
? (cons 'A 'B)
(A . B)
?
The box-and-pointer notation for this dotted pair looks like this:
_______
| | |
| | | --+---->B
|_|_|___|
|
A
You used to see dotted pairs used a lot in LISP programming, but they're not
used as frequently any more (except in some circumstances, which we might get
to in this course). I personally interpret these as meaning that I "cons'ed"
two things together that I didn't mean to. And that happens more often
than I like to admit.
A dangerous piece of knowledge
Lots of LISP programming errors result from using "quote" when not necessary,
or not using it when you should, so you might want to sit down for a while in
front of your favorite LISP system and play with the stuff from today's
lecture for awhile. In fact, in general, it's a good idea to go to your
favorite LISP system as soon as you can after lecture and work through
any examples and expand on them. The practice will be good for you.
And in doing this kind of practice, it might be helpful to know how to bind
symbols to values using an assignment operator. So now I'm going to show you
how to do that, but for now you can only use assignment to help you get a feel
for quoting and evaluation and stuff like that. DO NOT USE THIS ASSIGNMENT
OPERATOR IN ANYTHING YOU SUBMIT FOR GRADING UNTIL WE TELL YOU TO AS IT
VIOLATES FUNCTIONAL PROGRAMMING CONSTRAINTS. Someday in the weeks ahead,
we'll let you use assignment in a responsible fashion, but for now just use
it in practice. Otherwise, you'll find yourself getting amazingly low
grades while doing extra work. Remember: friends don't let friends use
assignment.
The generic assignment operator is "setq", and takes two arguments. The first
argument is a symbol (e.g., a variable name), and the second argument is the
thing you want the symbol bound to. The first argument is not evaluated, but
the second argument is, which should tell you that "setq" is not an ordinary
LISP function. The "setq" function evaluates the second argument, binds it
to the first argument, and returns the evaluated second argument:
? (setq A '(X Y Z 4))
(X Y Z 4)
? (first A)
X
?
? (setf A (X Y Z 4))
> Error: Unbound variable: Y
> While executing: SYMBOL-VALUE
> Type Command-/ to continue, Command-. to abort.
> If continued: Retry getting the value of Y.
See the Restarts... menu item for further choices.
1 >
A variation of "setq" is "setf". They're different, but it's not important
for you to know how they're different just yet. We'll deal with that later.
For now, assume that they're interchangeable.
Something for nothing
What's the technical difference between the list and the dotted pair? It's
how they are terminated. A "well-formed" list always ends with something
called "nil", which we represented in the box-and-pointer diagram last time
with a big slash through the right word of the last element. A dotted pair,
on the other hand, ends with something other than "nil".
So what's a nil? A nil is a very special thing in LISP. First, it has the
unusual property of being both an atom and a list at the same time. What
kind of list? It's the empty list, represented by "()". Consequently, "nil"
and "()" are the same thing. It's also the symbol meaning Boolean "false" in
LISP (Boolean "true" is the predefined "T" or anything non-nil). And, as we
noted just above, it's always the end of a "well-formed" list, which allows
LISP to maintain the following sorts of consistencies:
The first element of the empty list is, naturally, nothing, which is nil,
which is the empty list:
? (first nil)
NIL
?
The rest of the empty list, which is just the empty list with the first
element (i.e., nothing) removed, is also the empty list:
? (rest nil)
NIL
?
But if we try to put (first nil) and (rest nil) back together using "cons",
we should get the original thing we started with, which was nil, no? No:
? (cons (first nil) (rest nil))
(NIL)
?
It's a result of the unique nature of nil in LISP. Think of it as one of
LISP's endearing idiosyncracies.
The conditional
A programming language really isn't worth much if there's no
way to change the flow of program control. Being able to
take different branches depending on the results of some test
is what makes computer programs useful. Without conditionals,
computing in general would be really boring. The basic mechanism
for doing this is called the "conditional", and in LISP the
fundamental conditional is called "cond". Here's the syntax:
(cond (*test1* *action1*)
(*test2* *action2*)
:
:
(*testN* *actionN*))
If the expression *test1* evaluates to non-nil, then the
"cond" function returns what the expression *action1*
evaluates to. (Since *test1* is an expression, we'd expect
to see a function call there, or maybe a symbol bound to some
value...that sort of thing.) If *test1* evaluates to nil,
the "cond" skips to *test2*, which is evaluated as above,
and so on. Each test-action pair is called a "cond clause".
If all the tests are evaluated in sequence, and all tests
evaluate to nil, then the "cond" returns nil. While you can
count on this to happen, it may not be immediately obvious to
other folks who read the "cond" expression exactly what the
original programmer intended to occur in this case. Good
programming style in general demands that you make your
intentions explicit in your code. Here, that means you
should always end your "cond" with a cond clause which makes
it obvious what you expect to happen when all the previous
tests evaluate to nil. You do it like this:
(cond (*test1* *action1*)
(*test2* *action2*)
:
:
(*testN* *actionN*)
(T *what you want to happen if all else fails*))
Also, you can have more than one action in each cond clause.
If the test is non-nil, the associated actions will be
evaluated left-to-right, and the last expression evaluated
will be the one returned by the "cond" function. (Note,
though, that since we're not letting you create any side-
effects by assigning values to variables yet, this feature
won't be all that useful to you just now.)
Predicates
Common LISP provides a set of functions which are designed to
execute useful tests and Boolean or true/false values
depending on the outcome of the test. These are called
predicates, and we use them all the time as the tests in our
"cond" functions. Here are some commonly-used predicates:
(null *expr*) returns non-nil if *expr* is the empty
list, nil if *expr* is not empty
(atom *expr*) returns non-nil if *expr* is an atom,
nil if *expr* is not an atom
(numberp *expr*) returns non-nil if *expr* is a number,
nil if *expr* is not a number
(listp *expr*) returns non-nil if *expr* is a list,
nil if *expr* is not a list
Historically, many functions designed to work as predicates
(i.e., returning true/false values) have had the letter "p"
appended to their names, hence "numberp" and "listp".
Obviously, folks haven't been too consistent in this, since
"atom" is not "atomp". It's quaint idiosyncrasies like this
that give any language some personality, no? Sometimes, this
sort of stuff filters into everyday language use. For
example, one LISP hacker might ask if another is interested
in going to lunch by saying simply "lunchp?"....I guess you
had to be there.
Copyright 1998 by Kurt Eiselt. All rights reserved.
Last revised: January 13, 1998