CS 1321X - Lecture 6 - September 4, 2003

CS 1321X - Lecture 6

Simple Recursion on Lists



I. The list, revisited 

On Tuesday, we offered an informal way of defining Scheme's basic
data structure, called a list.  We said that a list is an ordered set 
of any number things that must begin with a "(" and end with a ")". Those 
things in the list might be lists themselves, or there might not be any 
things in the list at all, which describes something called the empty list. 

A more formal and precise definition of a list is as follows:  A list 
can be defined recursively as either the empty list or a  [dotted] pair 
whose cdr is a list.

So now that we know what lists are, let's start doing stuff with them.


II. Simple operations on lists 

There are three fundamental operations on lists: two of them are used for 
decomposing lists and accessing their components, while the other 
operation is used for composing or constructing lists. 

The first list operator is called "car". Given an argument which is a list, 
"car" returns the first element of that list. For example, if the symbol A 
has been bound to the list (X Y Z 4), then (car A) will return X. Note that 
the original list will not be altered in any way; A is still bound to the 
list (X Y Z 4). So even though this isn't a mathematical function in the 
sense that we're used to, it still behaves like a function in that it 
doesn't change anything that's passed to it through a parameter. 

"Hey, wait! Isn't "car" one of the operations that work with pairs?" you 
might ask. Well of course. If you go back and look at the definition for 
a pair, you'll see that all lists (at least all proper lists) are pairs, 
except for the empty list, which is special, so it's not a pair, but it 
is a list.  More about that later. 

The second list operator is "cdr". Given an argument which is a list, 
"cdr" will return the list consisting of all the original elements of the 
list except the first element. Thus, assuming A is still bound to (X Y Z 4), 
(cdr A) will return (Y Z 4). 

Before we move to the next operator, let's take a moment to remember that 
lists are pairs. They just happen to be pairs where the cdr is a list. But 
if they're pairs, where are the dots? Good question. The dots are messy and 
distracting, so Scheme doesn't print them except when it's necessary to show 
you what's going on, like when you have a dotted pair that isn't a list. 
When you see this: 

(X Y Z 4) 

you're really looking at this: 

(X . (Y . (Z . (4 . ())))) 

Perhaps you don't believe me. You will soon, just be patient. It's kind of 
confusing now. It'll make more sense over time. For now, just concentrate 
on lists and don't worry much about dotted pairs. 

The third operator, obviously, is called "cons". Given two arguments, where 
the first is anything and the second is a list, "cons" returns the list that 
you get by inserting the first argument as the first element of the list 
that's the second argument. Got it? So, 

(cons (car A)(cdr A)) 

returns 

(X Y Z 4) 

(If you cons something onto the front of a list, you get a new list. If 
you cons something onto something that isn't a list, you get a dotted
pair, but you don't get a list.) 

Let's go over it again: 

What does (car (X Y Z 4)) return? 

Time's up. If you said it returns X, then you're dead wrong. Why? Because 
you forgot the evaluation rules: 

(car A) where A is bound to the list (X Y Z 4) is not the same as 
(car (X Y Z 4))! 

In the first case, the argument A evaluates to the list (X Y Z 4), and 
then when the definition of "car" is applied to that, the first element of 
that list, X, is returned. Great. In the second case, however, Scheme 
looks at the argument (X Y Z 4) and tries to evaluate that as a call 
to the function named X with the arguments Y, Z, and 4. If you don't have 
a previously-defined function named X, or if Y or Z aren't bound to 
something, you'll see Scheme grind to a screeching halt, as we've mentioned 
before. 

In short, as we've said before, anything you throw at Scheme, Scheme will 
try to evaluate according to those rules of evaluation we told you about. 
Give it a number and Scheme will return that number. Give Scheme a symbol 
and Scheme will try to find what the symbol is bound to. Give it a list, 
and Scheme will try to evaluate it as a function call... 


III. Preventing evaluation 

...unless you explicitly tell Scheme not to evaluate what you're giving it! 
How do you do that? With another function, called "quote", which is merely 
a function that takes an argument, doesn't evaluate it, and returns that 
argument. For example, (quote (X Y Z 4)) returns (X Y Z 4).  So 
(car (quote (X Y Z 4))) returns X, not an error. 

The explanation above is misleading.  At a surface level, quote doesn't
appear to do anything to its argument.  That is, it doesn't try to 
evaluate a list as if it were a function call.  But when a list is passed
to quote, Scheme does cons together or construct the corresponding 
data structure in memory and return the pointer to that data structure.
And when that pointer is returned to the evaluator window, Dr. Scheme
knows to print the list representation of that structure on the screen
instead of the address in memory where it's located.

The "quote" function is used so often that it gets an abbreviation: the 
apostrophe or single quote mark. So (quote (X Y Z 4)) is the same
thing as '(X Y Z 4), and 

>  (car '(X Y Z 4))
X
>  (cdr '(X Y Z 4))
(Y Z 4)
>  (cons 'X '(Y Z 4))
(X Y Z 4)
>

Remember when I said that 

(X . (Y . (Z . (4 . ())))) 

was the same thing as 

(X Y Z 4) 

hmmm? Well, try it and see for yourself. Type 

'(X . (Y . (Z . (4 . ())))) 

in the evaluation window and see what happens. And don't forget the quote 
at the very beginning, else Dr. Scheme will be looking for a function 
named "X". 

This quote thing can be very confusing.  Students often find themselves 
peppering their code with apostrophes in desperate but failed attempts 
to make things work.  If you get to that point, stop, take a break, 
and start over.  Or go find some help.


IV. Recursion on lists - an example 

Before we used recursion to do things repeatedly to numbers. Now that we 
have this new data structure, the list, of course we're going to want to 
do things repeatedly to lists. 

Let's say we want to define a function which tells us if a given item is an 
element of a given list.  This turns out to be a very useful function, and it 
already exists in Scheme. It's called "member". But even though it already 
exists, we want the practice, so we're going to construct our own version. 
And to make sure we don't inadvertently replace Scheme's version with our 
own possibly buggy version, we'll give ours a distinctive name. Following a 
tradition handed down through generations of programming courses, we'll use 
the convention of creating these distinctive names by taking the name of 
the Scheme function we're trying to mimic  and adding the prefix "my-" to it. 
Thus we generate the name "my-member" for our own version of "member". 

What will the design look like? We can sketch it out with a combination of 
the Scheme syntax we already know, and some English where we're not sure 
about the Scheme yet. Here's the first cut: 

    (define (my-member input-item input-list)
           if done then return "no"
      else if input-item = first element of input-list
           then return "yes"
      else what?  see if input-item = next thing on input-list?
           how? )

As you might guess, we'll need "if" or something like it to choose between 
different things we might do depending on what the list looks like. The "if" 
conditional provides only a two-way branch: the "then" branch and the "else" 
branch. That's fine some of the time, but other times we'd like a three- or 
four- or more-way branch. If we used only "if", we'd nest "if" within "if" 
within "if" and so on, and with appropriate indentation we'd see our code 
walk across the screen from left to right when we read it. 

(define (foo x)
    (if ...

        (if ...

            (if ...

                (if ...)))))


Lots of nested if's make things hard to read; it's hard to know which tests 
have returned #t or #f when you're a few levels deep in if's.   And don't 
argue that nested if's look more "natural"--if the resulting look were 
natural, you'd read your daily newspaper by holding it at a 45 degree angle.

So when we need three-or-more-way conditionals in Scheme, good programming 
style dictates that we use "cond" instead of "if". Here's how "cond" works: 

(cond (*test1* *action1*)
      (*test2* *action2*)
           :
           :
      (*testN* *actionN*))

If the expression *test1* evaluates to non-#f, then the "cond" function 
returns what the expression *action1* evaluates to.  Since *test1* is an 
expression, we'd expect to see a function call there, or maybe a symbol 
bound to some value...that sort of thing.  If *test1* evaluates to #f, the 
"cond" skips to *test2*, which is evaluated as above, and so on. Each 
test-action pair is called a "cond clause". 

If all the tests are evaluated in sequence, and all tests evaluate to #f, 
then the "cond" returns nothing (which you now know is really something...that
is, it's not #f). While you can count on this to happen, it may not be 
immediately obvious to other folks who read the "cond" expression exactly 
what the original programmer intended to occur in this case. Good programming 
style in general demands that you make your intentions explicit in your code. 
Here, that means you should always end your "cond" with a cond clause which 
makes it obvious what you expect to happen when all the previous tests 
evaluate to #f. You do it like this: 

(cond (*test1* *action1*)
      (*test2* *action2*)
           :
           :
      (*testN* *actionN*)
      (else *what should happen if all else fails*))

Which is essentially the same as 

(cond (*test1* *action1*)
      (*test2* *action2*)
           :
           :
      (*testN* *actionN*)
      (#t *what should happen if all else fails*))

even though "else" does not evaluate to #t when you type it at the evaluator. 

Also, you can have more than one action in each cond clause. If the test is 
non-#f, the associated actions will be evaluated left-to-right, and the last 
expression evaluated will be the one returned by the "cond" function. (Note, 
though, that since we're not letting you create any side-effects by assigning 
values to variables yet, this feature won't be all that useful to you just 
now. In other words, since nothing about what a function does persists after 
the function is no longer being evaluated, there's really no value to having 
multiple actions in a cond clause just yet. It'll be useful later, but not 
now.) 

So now that we have this useful new tool, and since you know that nested 
if's make things harder to follow, we need to know how we are going to turn 
all that "if-then-else" stuff into a "cond"... here's how: 

    (define (my-member input-item input-list)
      (cond (done then return "no")
            (input-item = first element of input-list
             then return "yes")
            (what?  see if input-item = next thing on
             input-list? how? ) ) )

Hmmm. That looks a little more like Scheme, but it sure won't run on 
Dr. Scheme. What looks like something that's going to be real easy to turn 
into Scheme? How about that test to see if input-item is the same as the 
first element of input-list? We don't know how to do that just yet, so 
we'd better stop and talk about that.... 


V.   More predicates 

We've talked about tests that you can perform on numbers...things like > 
and < and =. But what do you do when you're working with much more 
interesting pieces of data, like pairs or lists? 

Fortunately, there are all kinds of useful predicates to use with these data 
types. For example, you can tell if something is a pair by using the 
pair? function: 

> (pair? '(a . b))
#t
> (pair? '(a b))
#t
> (pair? ())
#f

Similarly, you can find out if some piece of data is a list by invoking the 
list? function: 

> (list? '(a . b))
#f
> (list? '(a b))
#t
> (list? ())
#t

What can you infer from these examples? Not all pairs are lists, but all 
lists are pairs, except for the empty list. The empty list is a list, but 
not a pair. The empty list has no car or cdr: 

>  (car ())
car: expects argument of type ; given ()
>  (cdr ())
cdr: expects argument of type ; given ()
>

Two things to notice at this point...First, () and '() are generally 
interchangeable. I tend to use the former, while other folks tend to use 
the latter.  Like numbers, the empty list evaluates to itself, so
the quote doesn't make much difference.  Whatever feels better for you 
is fine with me. Second, most Scheme predicate names tend to end with 
a question mark. Why? To remind you that they're asking a question of 
sorts, like "Is this thing a list?" or "Is this thing a pair?" 

There's another handy list-based predicate: null? This predicate tells you 
if the thing you're looking at is the empty list. 

>  (null? ())
#t
>  (null? '())
#t
>  (null? '(a . b))
#f
>  (null? '(a b))
#f

The reason this predicate is so handy is that testing for an empty list is 
very often a termination condition used when recursing on a list.  You'll 
see this when we resume our look at my-member. 

The empty list just happens to have a name...it's "null". So if you type this 

> null 

you should see this 

() 

and if you type this 

> (null? null) 

you should see this 

#t 

Sometimes you want to see if two things are equal. If they're numbers, then 
you can use = and you've seen that already. But there are other equality 
tests that test for different attributes. Yes, you read that right, there are 
multiple versions of equality. The = operator works only on numbers. But what 
if you wanted to look at two data objects, say two lists, and see if they 
look the same (i.e, they have the same structure and content). You could type 

>  (equal? '(a (b) c) '(a (b) c))
#t

to see if those two things have the same value...that is, one is essentially 
an exact copy of the other. But that doesn't necessarily mean that the two 
objects occupy the same place in memory, which means that they are one and 
the same, not just copies. To test to see if two things are actually just 
one thing (i.e., two pointers to the same place in memory), you could type 

>  (eq? '(a (b) c) '(a (b) c))
#f

and find out that Scheme actually built two lists that look exactly alike 
but put them in different places in memory in response to the two calls to 
the quote function. 

There's yet another equality predicate, called eqv?, which is less 
discriminating than eq? and more discriminating than equal?, but is much
closer to eq? than equal?  The eqv? predicate takes into account that
the representation of numbers may differ from one Scheme implementation
to another, but that goes way beyond the scope of this course.  All these 
equality predicates are discussed in your textbook, but we'll mostly use 
equal? in this class.  Why do the others exist at all?  They can be
really important when you have the ability to destructively change
data structures, which we aren't letting you do yet.  Knowing whether
changing the list structure at the end of one pointer will or will
not change the list structure at the end of another pointer can
be pretty important.  Also, if you have occasion to compare really
big data structures, note that equal? compares the structures component
by component, and that could conceivably eat up lots of time.  On
the other hand, eq? just compares two memory addresses.  For speed
freaks, that's a very important difference.  We're not speed freaks
in this class.


VI. And now back to our regularly scheduled program 

So now that you know about all these useful predicates, how do they plug 
into our definition of my-member? Simple, I hope. First we put the "cond" 
structure in place so we know where to plug those predicates in, and then 
let's tackle the equality test that's called for. We're not necessarily 
testing two numbers to see if they're the same, so = is not right. Our 
choice is eq? or equal? Do we care that the two things we're comparing 
are actually in the same location in memory? No. So eq? isn't right 
either.  All we're left with is equal?, which is a very general equality 
test that we'll use most of the time anyway. Let's plug it in, using the 
correct syntax for that part of the cond clause: 

    (define (my-member input-item input-list)
      (cond (done then return "no")
            ((equal? input-item (car input-list))
             then return "yes")
            (what?  see if input-item = next thing on
             input-list? how? ) ) )

And how do we return "yes" in that case? That should be pretty obvious 
to you by now, I hope: 

    (define (my-member input-item input-list)
      (cond (done then return "no")
            ((equal? input-item (car input-list)) #t)
            (what?  see if input-item = next thing on
             input-list? how? ) ) )


Nothing to it. How are we going to test if we're done? Well, if we just sort 
of walk along input-list, testing the individual elements to see if they 
match input-item, what would be the termination point? When we run out of 
input-list, or, in other words, when input-list is empty or () or null 
(which are all the same thing). So now we can translate more English into 
Scheme: 

    (define (my-member input-item input-list)
      (cond ((null? input-list) #f)
            ((equal? input-item (car input-list)) #t)
            (what?  see if input-item = next thing on
             input-list? how? ) ) )


Wow. Now I have more Scheme than English. But there's still one missing 
chunk. How do I get this thing to repeat for every element of input-list 
(or at least until I match input-item)? 

So far, we've already coded two different termination conditions: stopping 
when we get to the end of input-list without finding a match, and stopping 
when we find a match with input-item. And the test to see if we find a 
match between input-item and the first element of input-list is effectively 
the "performing that operation of one element of the list" that we just 
mentioned. But if neither of those conditions is true, what do we want to do? 
We want to call "my-member" on the remainder of input-list, since that will 
get our matching operation performed on the next element of the list, while 
at the same time reducing the size of input- list and thereby getting us 
closer to a termination condition. The end result looks like this: 

   (define (my-member input-item input-list)
     (cond ((null? input-list) #f)
           ((equal? input-item (car input-list)) #t)
           (else (my-member input-item (cdr input-list)))))


Oh, one other thing. When "my-member" finds a match, it returns #t. But when 
Scheme's "member" function returns a match, it returns that part of input-list
which begins with input-item. That's also a non-#f result, so it has the same 
Boolean value, but it gives us more information than just #t. You'll find 
that Scheme tries to do that a lot, and you should think about doing it too 
when you can. To make "my-member" work that way, it would be changed to this: 

  (define (my-member input-item input-list)
    (cond ((null? input-list) #f)
          ((equal? input-item (car input-list)) input-list)
          (else (my-member input-item (cdr input-list)))))

Here's a quick question for you...can you now create the tail-recursive 
version of my-member? Here's a hint...you just did. Remember, the fact that 
something is tail-recursive doesn't imply that there's one function calling 
another. All that's required is that there are no postponed computations 
pushed on the stack when the function calls itself...that is, all the "work" 
is done in the tail of the recursive call. 

Ah, one other thing.  We can use [ and ] instead of ( and ) when we want
to.  The square brackets are really useful for lumping all the parts of a 
cond clause together and distinguish it from the other cond clauses, like this:

(define (my-member input-item input-list)
    (cond [(null? input-list) #f]
          [(equal? input-item (car input-list)) input-list]
          [else (my-member input-item (cdr input-list))]))

Oh, and yet one more thing.  You actually could use { and } but it's 
not commonly done.  If you do it in this class, you'll likely invoke 
the wrath of your grader.  Don't use { and }.




Copyright (c) 2003 by Kurt Eiselt.  All rights reserved, with 
the exception of stuff that belongs to somebody else.

Last revised: September 15, 2003