CS 1321X - Lecture 7 - September 9, 2003

CS 1321X - Lecture 7

Becoming One with the List



I. Box-and-pointer notation

Now you're comfortable, I hope, with the notion that '(X Y Z 4), for example, 
is a list. More specifically, it's something called a linked list.  That is, 
there are hidden but very real links between the elements of the list. Those 
links are often called pointers, and playing with pointers in some 
programming languages is fraught with peril. But unlike those other languages,
such as Pascal and C, Scheme doesn't require you to worry about the pointer 
details.  Nevertheless, sometimes it's the case that Scheme folks like to 
see the pointers to help them understand what's going on when they're 
writing functions that work on lists. So here's a more tangible (i.e., less 
abstract) picture of what our sample list looks like: 

  A
  |
  |
  |    _______       _______       _______       _______
  +-->|   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | / | or -->()
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |
        X             Y             Z             4

In this lovely figure, each element of the list is represented as a pair of 
"slots" (which could be words or bytes or whatever) in memory. The first or 
left slot contains a pointer (i.e., the address in memory) to the symbol 
that "is" the first element, and the second or right slot contains a pointer 
to the next element. It's now easy to see that, given a pointer to this list, 
the function "car" (e.g., (car A)) looks at the pair of cells at the end of 
that pointer, follows the pointer stored in the left slot, and returns what's 
at the end of that pointer. In this case, it's the symbol X. The "cdr" 
function (e.g., (cdr A)) looks at the pair of slots pointed at by A, follows 
the pointer stored in the right slot, and returns what's at the end of that 
pointer, which in this case is the list (Y Z 4). Note again that these 
operations didn't change the structure of the list, they merely followed 
pointers and returned what they pointed to. Now when you call my-member 
like this 

>  (my-member 'z '(x y z 4))

or like this 

>  (my-member 'z a)  {where a is bound to '(x y z 4)}

and see this result 

(z 4)

What happens is that my-member follows the pointers that connect the pairs 
of slots from front to back, looking for a car of a pair (in this case, the 
pairs are also called "cons cells"...more about that later), that is the 
same as the input-item, 'z. When and if it finds that match, my-member 
returns the list that begins with the symbol z. (This pointer stuff just 
isn't that hard.) Each of these two-slot pairs are called "cons cells", 
because it's what you get when you "cons" two things together. Every time 
you call "cons", Scheme automatically grabs a two-part cons cell for you from 
a place in memory called the "heap", and attaches that cons cell onto the 
front of whatever you wanted to cons something to. (What's a heap? Don't 
worry about it.) While it's prettier to cons things onto proper lists, it's 
not necessary, as you already know. You can in fact have something that's 
not a list as the second argument to a "cons" function, but the result is 
that dotted pair thing, which isn't a true or proper list. 

>  (cons 'A 'B)

(A . B)

The box-and-pointer notation for this dotted pair looks like this: 

       _______
      |   |   |
      | | | --+---->B
      |_|_|___|
        |
        A

Here's a more complicated list and its box-and-pointer representation, with 
a list inside a list: 

                 (X (Q R) Z 4)


  |
  |
  |    _______       _______       _______       _______
  +-->|   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | / |
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |
        X             |             Z             4
                      |
                     \|/
                     _______       _______
                    |   |   |     |   |  /|
                    | | | --+---->| | | / |
                    |_|_|___|     |_|_|/__|
                      |             |
                      Q             R


And here's yet another one, with a list inside a list inside a list: 

                 (X (Q (R)) Z 4)


  |
  |
  |    _______       _______       _______       _______
  +-->|   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | / |
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |
        X             |             Z             4
                      |
                     \|/
                     _______       _______
                    |   |   |     |   |  /|
                    | | | --+---->| | | / |
                    |_|_|___|     |_|_|/__|
                      |             |
                      Q             |
                                    |
                                   \|/
                                   _______
                                  |   |  /|
                                  | | | / |
                                  |_|_|/__|
                                    |
                                    R


And still one more: 

                 (X ((Q) R) Z 4)


  |
  |
  |    _______       _______       _______       _______
  +-->|   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | / |
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |
        X             |             Z             4
                      |
                     \|/
                     _______       _______
                    |   |   |     |   |  /|
                    | | | --+---->| | | / |
                    |_|_|___|     |_|_|/__|
                      |             |
                      |             R
                      |
                     \|/
                     _______
                    |   |  /|
                    | | | / |
                    |_|_|/__|
                      |
                      Q


Now that you've seen these examples, and assuming you understand them, there 
should be no list so complicated that you can't figure out the box-and-pointer
notation for it.
 

II. Something for nothing

What's the technical difference between the list and the dotted pair? It's 
how they are terminated. A proper list always ends with something called "()",
which we represented in the box-and-pointer diagrams above with a big slash 
through the right half of the last element. A dotted pair, on the other hand, 
ends with something other than "()". In other words, a proper list is formed 
by consing something onto the empty list, or onto something else that's a 
proper list. A dotted pair is formed by consing something onto something 
that's not a proper list.  And remember, the special word for the empty list 
is "null".  Oh, and one other thing.  Your textbook (see pp. 62-63) says that 
you have to quote the empty list to make things work right. That is, '() 
works, but just plain old () doesn't work if the evaluator sees it. 
Apparently, Dr. Scheme didn't read the book: 

>  ()
()
>  '()
()
>  (eq? () '())
#t
>


III. How does "cons" work?

We sometimes ask you in homework assignments to implement your own versions 
of pre-existing Scheme functions. This helps you to learn what functions are 
available to you, as well as to better understand what those functions do and 
how they do them. And that in turn will allow you to borrow techniques from 
those functions and apply them to new problems. One function we won't ask you 
to implement, however, is "cons", or more appropriately, "my-cons". Why? 
Because you can't. It's a fundamental, primitive operation that's built into 
the Scheme interpreter. If you wanted to see how it really works, you'd have 
to go into the guts of the system. But at a higher level, you can think of 
"cons" as working like this: Each time the Scheme system comes across a call 
to "cons", Scheme allocates a couple of words of free memory--something 
called a "cons cell". This cons cell is then filled with the appropriate 
pointers to whatever structures are being consed, and then a pointer to that 
cons cell is returned.  Thus the cons cell becomes the new first element in 
the structure that was consed to, and the pointer that is returned is 
pointing to that new first element. That was a high-level description of what 
"cons" does; here's a much more detailed explanation, with an example: Here's 
a list: 

(A B C D)

And here's a lower-level view of the same list, using our box-and-pointer 
notation: 

       _______       _______       _______       _______
      |   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | / |
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |
        A             B             C             D

Now let's say that we've created the following function: 

(define (foo x)
   (print x)
   (print (cons 'q x))
   (print x))

We haven't talked about input/output issues at all, but the print function is 
pretty useful. It just prints to your computer monitor exactly what you tell 
it to. Using it technically takes us outside the functional programming 
paradigm, because a print operation is a way of having something about the 
function persist after the function is no longer being evaluated. But let's 
not worry about that for now. Using print at appropriate places can be a 
useful debugging tool. Just don't leave any print functions in your homework 
or test solutions. Your TAs won't be happy about it at this time. So now 
let's say we call the function foo above on the list '(A B C D): 

>  (foo '(A B C D))

Passing the list through the parameter X binds the list to that symbol. We'll 
show that in our diagram with a pointer: 

  X
  |
  |
  |    _______       _______       _______       _______
  +-->|   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | / |
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |
        A             B             C             D

See, now there's a pointer called X, and that pointer points to the front of 
the list. There are three print operations in the foo function. The first one 
merely says to print the value that is bound to the parameter X. So the 
result of the first print is just this: 

(A B C D)

The next print depends upon the result of the cons operation. Consing the 
symbol Q onto the list bound to X requires a cons cell to be taken from free 
storage in memory. That cons cell is then made to point to the new head of 
the input-list using the cdr pointer slot (the second part of a cons cell). 
The car pointer slot (the first part of a cons cell) is made to point to the 
symbol Q. Cons returns, to the print function in this case, a pointer to the 
new cons cell, which in turn points to the list already bound to X (which, if 
you think about it, turns out to be the cdr of the new list): 

                X
  |             |
  |             |
  |    _______  |    _______       _______       _______       _______
  +-->|   |   | +-->|   |   |     |   |   |     |   |   |     |   |  /|
      | | | --+---->| | | --+---->| | | --+---->| | | --+---->| | | / |
      |_|_|___|     |_|_|___|     |_|_|___|     |_|_|___|     |_|_|/__|
        |             |             |             |             |
        Q             A             B             C             D

The print function now prints the list at the end of the pointer it's been 
given: 

(Q A B C D)

Now comes the big finish. The third print function again is asked to print 
the list that's bound to the parameter X (or more technically correct, the 
list that's pointed at by X). What does print print? The diagram makes it 
pretty clear: print will print 

(A B C D)

because we haven't done anything that changed the list that's associated 
with X. We've added a cons cell to the front of that list, and that new list 
had its own pointer (but we didn't save that pointer anywhere so now we can't 
even get at that new list if we wanted to), but we didn't alter anything that 
already existed. In fact, we don't know how to do that yet, and we won't be 
doing those sorts of alterations unless we move out of the functional 
programming paradigm. That inability to destructively modify data structures 
is one of the big features of functional programming that leads to programs 
with fewer problems. But as some of you are finding, or soon will be, this 
style of programming is also going to require you to do some things that will 
seem weird if you have experience with procedural languages like C or Pascal, 
for example, because being able to alter stuff is a fundamental assumption in 
that paradigm. But remember, no matter how awkward it might seem at first, 
it'll get better with practice, and there will always be a functional 
solution if there's a procedural solution.  

You can think of a cons as being a fundamental unit of expense in Scheme 
programming. A cons has a measurable cost both in terms of time to execute 
and amount of memory used. An extra cons here or there is no big deal, but 
we'd like to make sure that we don't end up using lots and lots of conses 
when we don't need to. We don't care about efficiency in the small, but we 
definitely care about efficiency in the large. 


IV.  Building lists with recursion

We can use the cons operation to build new lists while recursing or ("cdr"ing)
along a different list.  We call this list-consing recursion for lack of a 
better name.  Here's an example of a list-consing function that merely makes 
a copy of an existing list (assuming there's no nesting...don't worry 
about that detail for now):

(define (makecopy inlist)
  (cond [(null? inlist) ()]
        [else (cons (car inlist) 
                    (makecopy (cdr inlist)))]))

> (define a '(x y z)
> (eq? a a)
#t
> (makecopy a)
(x y z)
> (eq? a (makecopy a))
#f
> (equal? a (makecopy a))
#t

Tracing this behavior by hand looks like this:

(makecopy '(a b c))
(cons 'a (makecopy '(b c)))
(cons 'a (cons 'b (makecopy '(c))))
(cons 'a (cons 'b (cons 'c (makecopy ()))))
(cons 'a (cons 'b (cons 'c ())))
(cons 'a (cons 'b '(c)))
(cons 'a '(b c))
'(a b c)

Here's another example of list-consing recursion in a function that returns a 
new list consisting of the first n elements of an existing list: 

(define (first-n n my-list)
   (cond [(= n 0) ()]
         [else (cons (car my-list)
                     (first-n (- n 1) (cdr my-list)))]))

or 

(define (first-n n my-list)
   (if (= n 0)
       ()
       (cons (car my-list)
             (first-n (- n 1) (cdr my-list)))))

What's happening here? Let's say we type in 

>  (first-n 2 '(a b c d))

Using an abbreviated substitution model of evaluation, we'd see the following 
behavior: 

(first-n 2 '(a b c d))
(cons 'a (first-n 1 '(b c d)))
(cons 'a (cons 'b (first-n 0 '(c d))))
(cons 'a (cons 'b ()))
(cons 'a '(b))
(a b)

Thus (first-n 2 '(a b c d)) returns a list consisting of the first two 
elements of the list '(a b c d). 

Here's a function that takes two arguments, an integer and a list.  This 
function, which we'll call "nthcdr" then counts down the number of list 
elements indicated by the integer (by taking successive "cdr"s) and returns 
the list without those first elements. For example: 

>  (nthcdr 0 '(a b c))
(A B C)
>  (nthcdr 2 '(a b c))
(C)

Here's our version of "nthcdr": 

(define (nthcdr integer my-list)
   (cond ((null? my-list) ())
         ((= integer 0) my-list)
         (else (nthcdr (- integer 1) (cdr my-list)))))

There are perhaps four interesting things to note here. First, we used tail 
recursion again. In fact, it just seemed like the obvious thing to do.  
Second, we use more than one test for termination. This is fairly common, and 
occurs with augmenting recursion as well as tail recursion.  It's possible to 
make this work with only one test, as the test for an empty list is not 
necessary to make this function work. But if you think about and play with 
the function for awhile, you'll see that adding this test could eliminate a 
lot of pointless computation under some circumstances.  Third, once again 
there's no helping function in this example of tail recursion. Because we can 
use the argument "integer" as a counter, we don't need to introduce any new 
arguments to be used as variables to keep track of intermediate results.  
Finally, it may look like nthcdr returns a newly-built list, but it's actually
just returning a pointer to the middle of an existing list.  How do we know 
that nthcdr isn't making a copy of part of the list it started with?  There 
are no conses in our definition of nthcdr.

On the other hand, if we want to insert something into the middle of a list, 
we'd actually cons together a copy of the original list with the thing to be 
inserted consed into the copy in the appropriate location.  Huh?  Allow me 
to explain in more detail.

First, let's establish the convention that every place that you could 
possibly insert something into a list can be identified by a number
indicating the relative position of that place in the list. The first place 
will be numbered 0, the next place will be numbered 1, and so on.
Furthermore, position 0 is the place just before the already-existing first 
element of the list, and position 1 is the place between the first element 
and the second element, and so on. So the positions in a sample list are 
numbered like this: 

 ( A B C )
  ^ ^ ^ ^
  | | | |
  0 1 2 3

Now let's invent a function name. Call it "insert" (and not "my-insert", 
since it doesn't already exist in Scheme). To insert something, we'll need to 
tell the function what we want to insert in the list, where we want to insert 
it, and what list we want to have something inserted in. So calling this new 
"insert" function might look like this: 

>  (insert 'x 0 '(a b c))
(x a b c)
>  (insert 'x 1 '(a b c))
(a x b c)
>  (insert 'x 2 '(a b c))
(a b x c)
>  (insert 'x 3 '(a b c))
(a b c x)
>  (insert 'x 4 '(a b c))
car: expects argument of type ; given ()
>

Why does it blow up on position 4? Because there is no position 4 defined for 
this list. We could opt for a different result, but that will do for now. 

Now writing the first line of our new function is easy: 

(define (insert item position input-list)

What do we do if we want to insert something in position 0? That's the same 
as consing something onto the front of the list: 

   (cond ((= position 0) (cons item input-list))

If I don't want to put something in position 0, then the function needs to 
traverse the list until it finds the correct position. So for example, if I 
want to insert in position 1, the function should skip past the first element 
of the list and cons the to-be-inserted item to the front of the rest (or the 
cdr) of the list, right? And while that position might have been numbered 1 
for the original list, it's position 0 with respect to the cdr of the 
original list. In other words, to insert in position 1 in the original list, 
the function just conses the to-be-inserted item in position 0 of the cdr of 
the original list, and then conses the car of the original list onto the 
front of all that. Oh, and how do we turn position 1 into position 0? We 
just subtract 1 from the position number. So what does all that look like 
in Scheme? Here it is: 

         (else (cons (car input-list)
                     (insert item (- position 1)
                                  (cdr input-list))))))

Is there anything else to be done? No. In fact, by solving the problem for 
the case of position 0 and position 1, we've solved the problem for all 
positions. Here's the whole thing in one place: 

(define (insert item position input-list)
   (cond ((= position 0) (cons item input-list))
         (else (cons (car input-list)
                     (insert item (- position 1)
                                  (cdr input-list))))))

Lots of situations that call for recursion on a list will just require 
variations on the themes we've seen so far.  Analyze the structure of the 
list you're starting with, give some consideration to what you're being 
asked to do with that list, find a solution you've already created or have 
seen created by somebody else, and modify it accordingly.  You don't need to 
write every solution from scratch.


V.  Mutual recursion

Let's say you were building a program to play a two-person card game. You'd 
need a representation for a deck of cards. Here's a short version of one: 

'(10S AH 9D 4C 2H KS 3D 8C 6D 5H)

10S means ten of spades, AH means ace of hearts, and so on. You'd also need a 
function to deal the cards out to the two players. So you'd want to have your 
function alternate between hands, putting the first card in one hand, the 
next card in the other, and so on until there are no more cards. A common way 
of doing this is to employ what some folks call mutual recursion, where two 
(or more) functions call each other in a recursive fashion: 

(define (deal deck)
   (deal-2 deck () ()))

(define (deal-2 deck hand1 hand2)
   (cond ((null? deck) (list hand1 hand2))
         (else (deal-3 (cdr deck) (cons (car deck) hand1) hand2))))

(define (deal-3 deck hand1 hand2)
   (cond ((null? deck) (list hand1 hand2))
         (else (deal-2 (cdr deck) hand1 (cons (car deck) hand2)))))

Functions deal-2 and deal-3 do almost exactly the same thing...the only 
difference is they each put a card into a different hand. So in this case, 
the hand that gets the card is determined by the specific function being 
evaluated. Here's the result of evaluating deal on our test data (and note 
that in this case, order of cards in the hands doesn't matter): 

>  (deal '(10S AH 9D 4C 2H KS 3D 8C 6D 5H))
((6d 3d 2h 9d 10s) (5h 8c ks 4c ah))
>

This isn't exactly a great example of when to use mutual recursion; it's just 
a good example of how it works when you do use it. You could also consolidate 
the functions deal-2 and deal-3 into a single function that determines which 
hand gets the next card by looking at another parameter that is switched back 
and forth from one value to another. Something like that is often called a 
"flag" in computer science terminology. In this case, we've named that flag 
"whichhand", and we pass either a 1 or a 2 through this parameter to 
indicate which hand gets the next card: 

(define (deal deck)
   (deal-2 deck () () 1))

(define (deal-2 deck hand1 hand2 whichhand)
   (cond ((null? deck) (list hand1 hand2))
         ((= whichhand 1)
          (deal-2 (cdr deck) (cons (car deck) hand1) hand2 2))
         (else
          (deal-2 (cdr deck) hand1 (cons (car deck) hand2) 1))))



Copyright (c) 2003 by Kurt Eiselt.  All rights reserved, with 
the exception of stuff that belongs to somebody else.

Last revised: September 14, 2002