I. Box-and-pointer notation
Now you're comfortable, I hope, with the notion that '(X Y Z 4), for example,
is a list. More specifically, it's something called a linked list. That is,
there are hidden but very real links between the elements of the list. Those
links are often called pointers, and playing with pointers in some
programming languages is fraught with peril. But unlike those other languages,
such as Pascal and C, Scheme doesn't require you to worry about the pointer
details. Nevertheless, sometimes it's the case that Scheme folks like to
see the pointers to help them understand what's going on when they're
writing functions that work on lists. So here's a more tangible (i.e., less
abstract) picture of what our sample list looks like:
A
|
|
| _______ _______ _______ _______
+-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / | or -->()
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
X Y Z 4
In this lovely figure, each element of the list is represented as a pair of
"slots" (which could be words or bytes or whatever) in memory. The first or
left slot contains a pointer (i.e., the address in memory) to the symbol
that "is" the first element, and the second or right slot contains a pointer
to the next element. It's now easy to see that, given a pointer to this list,
the function "car" (e.g., (car A)) looks at the pair of cells at the end of
that pointer, follows the pointer stored in the left slot, and returns what's
at the end of that pointer. In this case, it's the symbol X. The "cdr"
function (e.g., (cdr A)) looks at the pair of slots pointed at by A, follows
the pointer stored in the right slot, and returns what's at the end of that
pointer, which in this case is the list (Y Z 4). Note again that these
operations didn't change the structure of the list, they merely followed
pointers and returned what they pointed to. Now when you call my-member
like this
> (my-member 'z '(x y z 4))
or like this
> (my-member 'z a) {where a is bound to '(x y z 4)}
and see this result
(z 4)
What happens is that my-member follows the pointers that connect the pairs
of slots from front to back, looking for a car of a pair (in this case, the
pairs are also called "cons cells"...more about that later), that is the
same as the input-item, 'z. When and if it finds that match, my-member
returns the list that begins with the symbol z. (This pointer stuff just
isn't that hard.) Each of these two-slot pairs are called "cons cells",
because it's what you get when you "cons" two things together. Every time
you call "cons", Scheme automatically grabs a two-part cons cell for you from
a place in memory called the "heap", and attaches that cons cell onto the
front of whatever you wanted to cons something to. (What's a heap? Don't
worry about it.) While it's prettier to cons things onto proper lists, it's
not necessary, as you already know. You can in fact have something that's
not a list as the second argument to a "cons" function, but the result is
that dotted pair thing, which isn't a true or proper list.
> (cons 'A 'B)
(A . B)
The box-and-pointer notation for this dotted pair looks like this:
_______
| | |
| | | --+---->B
|_|_|___|
|
A
Here's a more complicated list and its box-and-pointer representation, with
a list inside a list:
(X (Q R) Z 4)
|
|
| _______ _______ _______ _______
+-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
X | Z 4
|
\|/
_______ _______
| | | | | /|
| | | --+---->| | | / |
|_|_|___| |_|_|/__|
| |
Q R
And here's yet another one, with a list inside a list inside a list:
(X (Q (R)) Z 4)
|
|
| _______ _______ _______ _______
+-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
X | Z 4
|
\|/
_______ _______
| | | | | /|
| | | --+---->| | | / |
|_|_|___| |_|_|/__|
| |
Q |
|
\|/
_______
| | /|
| | | / |
|_|_|/__|
|
R
And still one more:
(X ((Q) R) Z 4)
|
|
| _______ _______ _______ _______
+-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
X | Z 4
|
\|/
_______ _______
| | | | | /|
| | | --+---->| | | / |
|_|_|___| |_|_|/__|
| |
| R
|
\|/
_______
| | /|
| | | / |
|_|_|/__|
|
Q
Now that you've seen these examples, and assuming you understand them, there
should be no list so complicated that you can't figure out the box-and-pointer
notation for it.
II. Something for nothing
What's the technical difference between the list and the dotted pair? It's
how they are terminated. A proper list always ends with something called "()",
which we represented in the box-and-pointer diagrams above with a big slash
through the right half of the last element. A dotted pair, on the other hand,
ends with something other than "()". In other words, a proper list is formed
by consing something onto the empty list, or onto something else that's a
proper list. A dotted pair is formed by consing something onto something
that's not a proper list. And remember, the special word for the empty list
is "null". Oh, and one other thing. Your textbook (see pp. 62-63) says that
you have to quote the empty list to make things work right. That is, '()
works, but just plain old () doesn't work if the evaluator sees it.
Apparently, Dr. Scheme didn't read the book:
> ()
()
> '()
()
> (eq? () '())
#t
>
III. How does "cons" work?
We sometimes ask you in homework assignments to implement your own versions
of pre-existing Scheme functions. This helps you to learn what functions are
available to you, as well as to better understand what those functions do and
how they do them. And that in turn will allow you to borrow techniques from
those functions and apply them to new problems. One function we won't ask you
to implement, however, is "cons", or more appropriately, "my-cons". Why?
Because you can't. It's a fundamental, primitive operation that's built into
the Scheme interpreter. If you wanted to see how it really works, you'd have
to go into the guts of the system. But at a higher level, you can think of
"cons" as working like this: Each time the Scheme system comes across a call
to "cons", Scheme allocates a couple of words of free memory--something
called a "cons cell". This cons cell is then filled with the appropriate
pointers to whatever structures are being consed, and then a pointer to that
cons cell is returned. Thus the cons cell becomes the new first element in
the structure that was consed to, and the pointer that is returned is
pointing to that new first element. That was a high-level description of what
"cons" does; here's a much more detailed explanation, with an example: Here's
a list:
(A B C D)
And here's a lower-level view of the same list, using our box-and-pointer
notation:
_______ _______ _______ _______
| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
A B C D
Now let's say that we've created the following function:
(define (foo x)
(print x)
(print (cons 'q x))
(print x))
We haven't talked about input/output issues at all, but the print function is
pretty useful. It just prints to your computer monitor exactly what you tell
it to. Using it technically takes us outside the functional programming
paradigm, because a print operation is a way of having something about the
function persist after the function is no longer being evaluated. But let's
not worry about that for now. Using print at appropriate places can be a
useful debugging tool. Just don't leave any print functions in your homework
or test solutions. Your TAs won't be happy about it at this time. So now
let's say we call the function foo above on the list '(A B C D):
> (foo '(A B C D))
Passing the list through the parameter X binds the list to that symbol. We'll
show that in our diagram with a pointer:
X
|
|
| _______ _______ _______ _______
+-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | |
A B C D
See, now there's a pointer called X, and that pointer points to the front of
the list. There are three print operations in the foo function. The first one
merely says to print the value that is bound to the parameter X. So the
result of the first print is just this:
(A B C D)
The next print depends upon the result of the cons operation. Consing the
symbol Q onto the list bound to X requires a cons cell to be taken from free
storage in memory. That cons cell is then made to point to the new head of
the input-list using the cdr pointer slot (the second part of a cons cell).
The car pointer slot (the first part of a cons cell) is made to point to the
symbol Q. Cons returns, to the print function in this case, a pointer to the
new cons cell, which in turn points to the list already bound to X (which, if
you think about it, turns out to be the cdr of the new list):
X
| |
| |
| _______ | _______ _______ _______ _______
+-->| | | +-->| | | | | | | | | | | /|
| | | --+---->| | | --+---->| | | --+---->| | | --+---->| | | / |
|_|_|___| |_|_|___| |_|_|___| |_|_|___| |_|_|/__|
| | | | |
Q A B C D
The print function now prints the list at the end of the pointer it's been
given:
(Q A B C D)
Now comes the big finish. The third print function again is asked to print
the list that's bound to the parameter X (or more technically correct, the
list that's pointed at by X). What does print print? The diagram makes it
pretty clear: print will print
(A B C D)
because we haven't done anything that changed the list that's associated
with X. We've added a cons cell to the front of that list, and that new list
had its own pointer (but we didn't save that pointer anywhere so now we can't
even get at that new list if we wanted to), but we didn't alter anything that
already existed. In fact, we don't know how to do that yet, and we won't be
doing those sorts of alterations unless we move out of the functional
programming paradigm. That inability to destructively modify data structures
is one of the big features of functional programming that leads to programs
with fewer problems. But as some of you are finding, or soon will be, this
style of programming is also going to require you to do some things that will
seem weird if you have experience with procedural languages like C or Pascal,
for example, because being able to alter stuff is a fundamental assumption in
that paradigm. But remember, no matter how awkward it might seem at first,
it'll get better with practice, and there will always be a functional
solution if there's a procedural solution.
You can think of a cons as being a fundamental unit of expense in Scheme
programming. A cons has a measurable cost both in terms of time to execute
and amount of memory used. An extra cons here or there is no big deal, but
we'd like to make sure that we don't end up using lots and lots of conses
when we don't need to. We don't care about efficiency in the small, but we
definitely care about efficiency in the large.
IV. Building lists with recursion
We can use the cons operation to build new lists while recursing or ("cdr"ing)
along a different list. We call this list-consing recursion for lack of a
better name. Here's an example of a list-consing function that merely makes
a copy of an existing list (assuming there's no nesting...don't worry
about that detail for now):
(define (makecopy inlist)
(cond [(null? inlist) ()]
[else (cons (car inlist)
(makecopy (cdr inlist)))]))
> (define a '(x y z)
> (eq? a a)
#t
> (makecopy a)
(x y z)
> (eq? a (makecopy a))
#f
> (equal? a (makecopy a))
#t
Tracing this behavior by hand looks like this:
(makecopy '(a b c))
(cons 'a (makecopy '(b c)))
(cons 'a (cons 'b (makecopy '(c))))
(cons 'a (cons 'b (cons 'c (makecopy ()))))
(cons 'a (cons 'b (cons 'c ())))
(cons 'a (cons 'b '(c)))
(cons 'a '(b c))
'(a b c)
Here's another example of list-consing recursion in a function that returns a
new list consisting of the first n elements of an existing list:
(define (first-n n my-list)
(cond [(= n 0) ()]
[else (cons (car my-list)
(first-n (- n 1) (cdr my-list)))]))
or
(define (first-n n my-list)
(if (= n 0)
()
(cons (car my-list)
(first-n (- n 1) (cdr my-list)))))
What's happening here? Let's say we type in
> (first-n 2 '(a b c d))
Using an abbreviated substitution model of evaluation, we'd see the following
behavior:
(first-n 2 '(a b c d))
(cons 'a (first-n 1 '(b c d)))
(cons 'a (cons 'b (first-n 0 '(c d))))
(cons 'a (cons 'b ()))
(cons 'a '(b))
(a b)
Thus (first-n 2 '(a b c d)) returns a list consisting of the first two
elements of the list '(a b c d).
Here's a function that takes two arguments, an integer and a list. This
function, which we'll call "nthcdr" then counts down the number of list
elements indicated by the integer (by taking successive "cdr"s) and returns
the list without those first elements. For example:
> (nthcdr 0 '(a b c))
(A B C)
> (nthcdr 2 '(a b c))
(C)
Here's our version of "nthcdr":
(define (nthcdr integer my-list)
(cond ((null? my-list) ())
((= integer 0) my-list)
(else (nthcdr (- integer 1) (cdr my-list)))))
There are perhaps four interesting things to note here. First, we used tail
recursion again. In fact, it just seemed like the obvious thing to do.
Second, we use more than one test for termination. This is fairly common, and
occurs with augmenting recursion as well as tail recursion. It's possible to
make this work with only one test, as the test for an empty list is not
necessary to make this function work. But if you think about and play with
the function for awhile, you'll see that adding this test could eliminate a
lot of pointless computation under some circumstances. Third, once again
there's no helping function in this example of tail recursion. Because we can
use the argument "integer" as a counter, we don't need to introduce any new
arguments to be used as variables to keep track of intermediate results.
Finally, it may look like nthcdr returns a newly-built list, but it's actually
just returning a pointer to the middle of an existing list. How do we know
that nthcdr isn't making a copy of part of the list it started with? There
are no conses in our definition of nthcdr.
On the other hand, if we want to insert something into the middle of a list,
we'd actually cons together a copy of the original list with the thing to be
inserted consed into the copy in the appropriate location. Huh? Allow me
to explain in more detail.
First, let's establish the convention that every place that you could
possibly insert something into a list can be identified by a number
indicating the relative position of that place in the list. The first place
will be numbered 0, the next place will be numbered 1, and so on.
Furthermore, position 0 is the place just before the already-existing first
element of the list, and position 1 is the place between the first element
and the second element, and so on. So the positions in a sample list are
numbered like this:
( A B C )
^ ^ ^ ^
| | | |
0 1 2 3
Now let's invent a function name. Call it "insert" (and not "my-insert",
since it doesn't already exist in Scheme). To insert something, we'll need to
tell the function what we want to insert in the list, where we want to insert
it, and what list we want to have something inserted in. So calling this new
"insert" function might look like this:
> (insert 'x 0 '(a b c))
(x a b c)
> (insert 'x 1 '(a b c))
(a x b c)
> (insert 'x 2 '(a b c))
(a b x c)
> (insert 'x 3 '(a b c))
(a b c x)
> (insert 'x 4 '(a b c))
car: expects argument of type ; given ()
>
Why does it blow up on position 4? Because there is no position 4 defined for
this list. We could opt for a different result, but that will do for now.
Now writing the first line of our new function is easy:
(define (insert item position input-list)
What do we do if we want to insert something in position 0? That's the same
as consing something onto the front of the list:
(cond ((= position 0) (cons item input-list))
If I don't want to put something in position 0, then the function needs to
traverse the list until it finds the correct position. So for example, if I
want to insert in position 1, the function should skip past the first element
of the list and cons the to-be-inserted item to the front of the rest (or the
cdr) of the list, right? And while that position might have been numbered 1
for the original list, it's position 0 with respect to the cdr of the
original list. In other words, to insert in position 1 in the original list,
the function just conses the to-be-inserted item in position 0 of the cdr of
the original list, and then conses the car of the original list onto the
front of all that. Oh, and how do we turn position 1 into position 0? We
just subtract 1 from the position number. So what does all that look like
in Scheme? Here it is:
(else (cons (car input-list)
(insert item (- position 1)
(cdr input-list))))))
Is there anything else to be done? No. In fact, by solving the problem for
the case of position 0 and position 1, we've solved the problem for all
positions. Here's the whole thing in one place:
(define (insert item position input-list)
(cond ((= position 0) (cons item input-list))
(else (cons (car input-list)
(insert item (- position 1)
(cdr input-list))))))
Lots of situations that call for recursion on a list will just require
variations on the themes we've seen so far. Analyze the structure of the
list you're starting with, give some consideration to what you're being
asked to do with that list, find a solution you've already created or have
seen created by somebody else, and modify it accordingly. You don't need to
write every solution from scratch.
V. Mutual recursion
Let's say you were building a program to play a two-person card game. You'd
need a representation for a deck of cards. Here's a short version of one:
'(10S AH 9D 4C 2H KS 3D 8C 6D 5H)
10S means ten of spades, AH means ace of hearts, and so on. You'd also need a
function to deal the cards out to the two players. So you'd want to have your
function alternate between hands, putting the first card in one hand, the
next card in the other, and so on until there are no more cards. A common way
of doing this is to employ what some folks call mutual recursion, where two
(or more) functions call each other in a recursive fashion:
(define (deal deck)
(deal-2 deck () ()))
(define (deal-2 deck hand1 hand2)
(cond ((null? deck) (list hand1 hand2))
(else (deal-3 (cdr deck) (cons (car deck) hand1) hand2))))
(define (deal-3 deck hand1 hand2)
(cond ((null? deck) (list hand1 hand2))
(else (deal-2 (cdr deck) hand1 (cons (car deck) hand2)))))
Functions deal-2 and deal-3 do almost exactly the same thing...the only
difference is they each put a card into a different hand. So in this case,
the hand that gets the card is determined by the specific function being
evaluated. Here's the result of evaluating deal on our test data (and note
that in this case, order of cards in the hands doesn't matter):
> (deal '(10S AH 9D 4C 2H KS 3D 8C 6D 5H))
((6d 3d 2h 9d 10s) (5h 8c ks 4c ah))
>
This isn't exactly a great example of when to use mutual recursion; it's just
a good example of how it works when you do use it. You could also consolidate
the functions deal-2 and deal-3 into a single function that determines which
hand gets the next card by looking at another parameter that is switched back
and forth from one value to another. Something like that is often called a
"flag" in computer science terminology. In this case, we've named that flag
"whichhand", and we pass either a 1 or a 2 through this parameter to
indicate which hand gets the next card:
(define (deal deck)
(deal-2 deck () () 1))
(define (deal-2 deck hand1 hand2 whichhand)
(cond ((null? deck) (list hand1 hand2))
((= whichhand 1)
(deal-2 (cdr deck) (cons (car deck) hand1) hand2 2))
(else
(deal-2 (cdr deck) hand1 (cons (car deck) hand2) 1))))
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: September 14, 2002