I. The list, revisited
On Tuesday, we offered an informal way of defining Scheme's basic
data structure, called a list. We said that a list is an ordered set
of any number things that must begin with a "(" and end with a ")". Those
things in the list might be lists themselves, or there might not be any
things in the list at all, which describes something called the empty list.
A more formal and precise definition of a list is as follows: A list
can be defined recursively as either the empty list or a [dotted] pair
whose cdr is a list.
So now that we know what lists are, let's start doing stuff with them.
II. Simple operations on lists
There are three fundamental operations on lists: two of them are used for
decomposing lists and accessing their components, while the other
operation is used for composing or constructing lists.
The first list operator is called "car". Given an argument which is a list,
"car" returns the first element of that list. For example, if the symbol A
has been bound to the list (X Y Z 4), then (car A) will return X. Note that
the original list will not be altered in any way; A is still bound to the
list (X Y Z 4). So even though this isn't a mathematical function in the
sense that we're used to, it still behaves like a function in that it
doesn't change anything that's passed to it through a parameter.
"Hey, wait! Isn't "car" one of the operations that work with pairs?" you
might ask. Well of course. If you go back and look at the definition for
a pair, you'll see that all lists (at least all proper lists) are pairs,
except for the empty list, which is special, so it's not a pair, but it
is a list. More about that later.
The second list operator is "cdr". Given an argument which is a list,
"cdr" will return the list consisting of all the original elements of the
list except the first element. Thus, assuming A is still bound to (X Y Z 4),
(cdr A) will return (Y Z 4).
Before we move to the next operator, let's take a moment to remember that
lists are pairs. They just happen to be pairs where the cdr is a list. But
if they're pairs, where are the dots? Good question. The dots are messy and
distracting, so Scheme doesn't print them except when it's necessary to show
you what's going on, like when you have a dotted pair that isn't a list.
When you see this:
(X Y Z 4)
you're really looking at this:
(X . (Y . (Z . (4 . ()))))
Perhaps you don't believe me. You will soon, just be patient. It's kind of
confusing now. It'll make more sense over time. For now, just concentrate
on lists and don't worry much about dotted pairs.
The third operator, obviously, is called "cons". Given two arguments, where
the first is anything and the second is a list, "cons" returns the list that
you get by inserting the first argument as the first element of the list
that's the second argument. Got it? So,
(cons (car A)(cdr A))
returns
(X Y Z 4)
(If you cons something onto the front of a list, you get a new list. If
you cons something onto something that isn't a list, you get a dotted
pair, but you don't get a list.)
Let's go over it again:
What does (car (X Y Z 4)) return?
Time's up. If you said it returns X, then you're dead wrong. Why? Because
you forgot the evaluation rules:
(car A) where A is bound to the list (X Y Z 4) is not the same as
(car (X Y Z 4))!
In the first case, the argument A evaluates to the list (X Y Z 4), and
then when the definition of "car" is applied to that, the first element of
that list, X, is returned. Great. In the second case, however, Scheme
looks at the argument (X Y Z 4) and tries to evaluate that as a call
to the function named X with the arguments Y, Z, and 4. If you don't have
a previously-defined function named X, or if Y or Z aren't bound to
something, you'll see Scheme grind to a screeching halt, as we've mentioned
before.
In short, as we've said before, anything you throw at Scheme, Scheme will
try to evaluate according to those rules of evaluation we told you about.
Give it a number and Scheme will return that number. Give Scheme a symbol
and Scheme will try to find what the symbol is bound to. Give it a list,
and Scheme will try to evaluate it as a function call...
III. Preventing evaluation
...unless you explicitly tell Scheme not to evaluate what you're giving it!
How do you do that? With another function, called "quote", which is merely
a function that takes an argument, doesn't evaluate it, and returns that
argument. For example, (quote (X Y Z 4)) returns (X Y Z 4). So
(car (quote (X Y Z 4))) returns X, not an error.
The explanation above is misleading. At a surface level, quote doesn't
appear to do anything to its argument. That is, it doesn't try to
evaluate a list as if it were a function call. But when a list is passed
to quote, Scheme does cons together or construct the corresponding
data structure in memory and return the pointer to that data structure.
And when that pointer is returned to the evaluator window, Dr. Scheme
knows to print the list representation of that structure on the screen
instead of the address in memory where it's located.
The "quote" function is used so often that it gets an abbreviation: the
apostrophe or single quote mark. So (quote (X Y Z 4)) is the same
thing as '(X Y Z 4), and
> (car '(X Y Z 4))
X
> (cdr '(X Y Z 4))
(Y Z 4)
> (cons 'X '(Y Z 4))
(X Y Z 4)
>
Remember when I said that
(X . (Y . (Z . (4 . ()))))
was the same thing as
(X Y Z 4)
hmmm? Well, try it and see for yourself. Type
'(X . (Y . (Z . (4 . ()))))
in the evaluation window and see what happens. And don't forget the quote
at the very beginning, else Dr. Scheme will be looking for a function
named "X".
This quote thing can be very confusing. Students often find themselves
peppering their code with apostrophes in desperate but failed attempts
to make things work. If you get to that point, stop, take a break,
and start over. Or go find some help.
IV. Recursion on lists - an example
Before we used recursion to do things repeatedly to numbers. Now that we
have this new data structure, the list, of course we're going to want to
do things repeatedly to lists.
Let's say we want to define a function which tells us if a given item is an
element of a given list. This turns out to be a very useful function, and it
already exists in Scheme. It's called "member". But even though it already
exists, we want the practice, so we're going to construct our own version.
And to make sure we don't inadvertently replace Scheme's version with our
own possibly buggy version, we'll give ours a distinctive name. Following a
tradition handed down through generations of programming courses, we'll use
the convention of creating these distinctive names by taking the name of
the Scheme function we're trying to mimic and adding the prefix "my-" to it.
Thus we generate the name "my-member" for our own version of "member".
What will the design look like? We can sketch it out with a combination of
the Scheme syntax we already know, and some English where we're not sure
about the Scheme yet. Here's the first cut:
(define (my-member input-item input-list)
if done then return "no"
else if input-item = first element of input-list
then return "yes"
else what? see if input-item = next thing on input-list?
how? )
As you might guess, we'll need "if" or something like it to choose between
different things we might do depending on what the list looks like. The "if"
conditional provides only a two-way branch: the "then" branch and the "else"
branch. That's fine some of the time, but other times we'd like a three- or
four- or more-way branch. If we used only "if", we'd nest "if" within "if"
within "if" and so on, and with appropriate indentation we'd see our code
walk across the screen from left to right when we read it.
(define (foo x)
(if ...
(if ...
(if ...
(if ...)))))
Lots of nested if's make things hard to read; it's hard to know which tests
have returned #t or #f when you're a few levels deep in if's. And don't
argue that nested if's look more "natural"--if the resulting look were
natural, you'd read your daily newspaper by holding it at a 45 degree angle.
So when we need three-or-more-way conditionals in Scheme, good programming
style dictates that we use "cond" instead of "if". Here's how "cond" works:
(cond (*test1* *action1*)
(*test2* *action2*)
:
:
(*testN* *actionN*))
If the expression *test1* evaluates to non-#f, then the "cond" function
returns what the expression *action1* evaluates to. Since *test1* is an
expression, we'd expect to see a function call there, or maybe a symbol
bound to some value...that sort of thing. If *test1* evaluates to #f, the
"cond" skips to *test2*, which is evaluated as above, and so on. Each
test-action pair is called a "cond clause".
If all the tests are evaluated in sequence, and all tests evaluate to #f,
then the "cond" returns nothing (which you now know is really something...that
is, it's not #f). While you can count on this to happen, it may not be
immediately obvious to other folks who read the "cond" expression exactly
what the original programmer intended to occur in this case. Good programming
style in general demands that you make your intentions explicit in your code.
Here, that means you should always end your "cond" with a cond clause which
makes it obvious what you expect to happen when all the previous tests
evaluate to #f. You do it like this:
(cond (*test1* *action1*)
(*test2* *action2*)
:
:
(*testN* *actionN*)
(else *what should happen if all else fails*))
Which is essentially the same as
(cond (*test1* *action1*)
(*test2* *action2*)
:
:
(*testN* *actionN*)
(#t *what should happen if all else fails*))
even though "else" does not evaluate to #t when you type it at the evaluator.
Also, you can have more than one action in each cond clause. If the test is
non-#f, the associated actions will be evaluated left-to-right, and the last
expression evaluated will be the one returned by the "cond" function. (Note,
though, that since we're not letting you create any side-effects by assigning
values to variables yet, this feature won't be all that useful to you just
now. In other words, since nothing about what a function does persists after
the function is no longer being evaluated, there's really no value to having
multiple actions in a cond clause just yet. It'll be useful later, but not
now.)
So now that we have this useful new tool, and since you know that nested
if's make things harder to follow, we need to know how we are going to turn
all that "if-then-else" stuff into a "cond"... here's how:
(define (my-member input-item input-list)
(cond (done then return "no")
(input-item = first element of input-list
then return "yes")
(what? see if input-item = next thing on
input-list? how? ) ) )
Hmmm. That looks a little more like Scheme, but it sure won't run on
Dr. Scheme. What looks like something that's going to be real easy to turn
into Scheme? How about that test to see if input-item is the same as the
first element of input-list? We don't know how to do that just yet, so
we'd better stop and talk about that....
V. More predicates
We've talked about tests that you can perform on numbers...things like >
and < and =. But what do you do when you're working with much more
interesting pieces of data, like pairs or lists?
Fortunately, there are all kinds of useful predicates to use with these data
types. For example, you can tell if something is a pair by using the
pair? function:
> (pair? '(a . b))
#t
> (pair? '(a b))
#t
> (pair? ())
#f
Similarly, you can find out if some piece of data is a list by invoking the
list? function:
> (list? '(a . b))
#f
> (list? '(a b))
#t
> (list? ())
#t
What can you infer from these examples? Not all pairs are lists, but all
lists are pairs, except for the empty list. The empty list is a list, but
not a pair. The empty list has no car or cdr:
> (car ())
car: expects argument of type ; given ()
> (cdr ())
cdr: expects argument of type ; given ()
>
Two things to notice at this point...First, () and '() are generally
interchangeable. I tend to use the former, while other folks tend to use
the latter. Like numbers, the empty list evaluates to itself, so
the quote doesn't make much difference. Whatever feels better for you
is fine with me. Second, most Scheme predicate names tend to end with
a question mark. Why? To remind you that they're asking a question of
sorts, like "Is this thing a list?" or "Is this thing a pair?"
There's another handy list-based predicate: null? This predicate tells you
if the thing you're looking at is the empty list.
> (null? ())
#t
> (null? '())
#t
> (null? '(a . b))
#f
> (null? '(a b))
#f
The reason this predicate is so handy is that testing for an empty list is
very often a termination condition used when recursing on a list. You'll
see this when we resume our look at my-member.
The empty list just happens to have a name...it's "null". So if you type this
> null
you should see this
()
and if you type this
> (null? null)
you should see this
#t
Sometimes you want to see if two things are equal. If they're numbers, then
you can use = and you've seen that already. But there are other equality
tests that test for different attributes. Yes, you read that right, there are
multiple versions of equality. The = operator works only on numbers. But what
if you wanted to look at two data objects, say two lists, and see if they
look the same (i.e, they have the same structure and content). You could type
> (equal? '(a (b) c) '(a (b) c))
#t
to see if those two things have the same value...that is, one is essentially
an exact copy of the other. But that doesn't necessarily mean that the two
objects occupy the same place in memory, which means that they are one and
the same, not just copies. To test to see if two things are actually just
one thing (i.e., two pointers to the same place in memory), you could type
> (eq? '(a (b) c) '(a (b) c))
#f
and find out that Scheme actually built two lists that look exactly alike
but put them in different places in memory in response to the two calls to
the quote function.
There's yet another equality predicate, called eqv?, which is less
discriminating than eq? and more discriminating than equal?, but is much
closer to eq? than equal? The eqv? predicate takes into account that
the representation of numbers may differ from one Scheme implementation
to another, but that goes way beyond the scope of this course. All these
equality predicates are discussed in your textbook, but we'll mostly use
equal? in this class. Why do the others exist at all? They can be
really important when you have the ability to destructively change
data structures, which we aren't letting you do yet. Knowing whether
changing the list structure at the end of one pointer will or will
not change the list structure at the end of another pointer can
be pretty important. Also, if you have occasion to compare really
big data structures, note that equal? compares the structures component
by component, and that could conceivably eat up lots of time. On
the other hand, eq? just compares two memory addresses. For speed
freaks, that's a very important difference. We're not speed freaks
in this class.
VI. And now back to our regularly scheduled program
So now that you know about all these useful predicates, how do they plug
into our definition of my-member? Simple, I hope. First we put the "cond"
structure in place so we know where to plug those predicates in, and then
let's tackle the equality test that's called for. We're not necessarily
testing two numbers to see if they're the same, so = is not right. Our
choice is eq? or equal? Do we care that the two things we're comparing
are actually in the same location in memory? No. So eq? isn't right
either. All we're left with is equal?, which is a very general equality
test that we'll use most of the time anyway. Let's plug it in, using the
correct syntax for that part of the cond clause:
(define (my-member input-item input-list)
(cond (done then return "no")
((equal? input-item (car input-list))
then return "yes")
(what? see if input-item = next thing on
input-list? how? ) ) )
And how do we return "yes" in that case? That should be pretty obvious
to you by now, I hope:
(define (my-member input-item input-list)
(cond (done then return "no")
((equal? input-item (car input-list)) #t)
(what? see if input-item = next thing on
input-list? how? ) ) )
Nothing to it. How are we going to test if we're done? Well, if we just sort
of walk along input-list, testing the individual elements to see if they
match input-item, what would be the termination point? When we run out of
input-list, or, in other words, when input-list is empty or () or null
(which are all the same thing). So now we can translate more English into
Scheme:
(define (my-member input-item input-list)
(cond ((null? input-list) #f)
((equal? input-item (car input-list)) #t)
(what? see if input-item = next thing on
input-list? how? ) ) )
Wow. Now I have more Scheme than English. But there's still one missing
chunk. How do I get this thing to repeat for every element of input-list
(or at least until I match input-item)?
So far, we've already coded two different termination conditions: stopping
when we get to the end of input-list without finding a match, and stopping
when we find a match with input-item. And the test to see if we find a
match between input-item and the first element of input-list is effectively
the "performing that operation of one element of the list" that we just
mentioned. But if neither of those conditions is true, what do we want to do?
We want to call "my-member" on the remainder of input-list, since that will
get our matching operation performed on the next element of the list, while
at the same time reducing the size of input- list and thereby getting us
closer to a termination condition. The end result looks like this:
(define (my-member input-item input-list)
(cond ((null? input-list) #f)
((equal? input-item (car input-list)) #t)
(else (my-member input-item (cdr input-list)))))
Oh, one other thing. When "my-member" finds a match, it returns #t. But when
Scheme's "member" function returns a match, it returns that part of input-list
which begins with input-item. That's also a non-#f result, so it has the same
Boolean value, but it gives us more information than just #t. You'll find
that Scheme tries to do that a lot, and you should think about doing it too
when you can. To make "my-member" work that way, it would be changed to this:
(define (my-member input-item input-list)
(cond ((null? input-list) #f)
((equal? input-item (car input-list)) input-list)
(else (my-member input-item (cdr input-list)))))
Here's a quick question for you...can you now create the tail-recursive
version of my-member? Here's a hint...you just did. Remember, the fact that
something is tail-recursive doesn't imply that there's one function calling
another. All that's required is that there are no postponed computations
pushed on the stack when the function calls itself...that is, all the "work"
is done in the tail of the recursive call.
Ah, one other thing. We can use [ and ] instead of ( and ) when we want
to. The square brackets are really useful for lumping all the parts of a
cond clause together and distinguish it from the other cond clauses, like this:
(define (my-member input-item input-list)
(cond [(null? input-list) #f]
[(equal? input-item (car input-list)) input-list]
[else (my-member input-item (cdr input-list))]))
Oh, and yet one more thing. You actually could use { and } but it's
not commonly done. If you do it in this class, you'll likely invoke
the wrath of your grader. Don't use { and }.
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: September 15, 2003