I. The lexical closure
Let's take another look at the notion of process and state. Let's say that
I want to construct a function that simulates a Coke (TM) machine. To do
this, I'll need a state variable to tell me how many cans of Coke there
are in this machine. I could do this:
(define number-of-cans 5)
and then my function could be this:
(define (get-coke)
(cond ((> number-of-cans 0)
(set! number-of-cans (- number-of-cans 1))
(write "have a Coke"))
(else (write "sorry, out of Coke"))))
I can now call "get-coke" five times, and I'll see the same message ("have
a Coke"), but on the sixth call, I'll see a new message ("sorry, out of
Coke"). I now have a function which knows its history -- it has state.
However, "number-of-cans" is a free variable -- that is, it's initialized
outside the lexical scope of the function in which it's referenced, and I
don't like that. In this case, it's a global variable that's modifiable by
all functions except where that name is also being used as a local or
lexically- scoped variable, and that's a situation that's ripe for trouble.
I'd really like this variable to be local to just one specific Coke machine.
And I'm sure I don't want to have to create a separate global variable for
each simulated Coke machine, especially if I'm going to simulate the behavior
of some vending machine company that may own hundreds or thousands of
individual Coke machines. Keeping track of all that could be difficult,
to say the least.
As you know, to create additional local variables beyond those that are
given in a function's argument list, I can use Scheme's "let" form. So my
revised "get-coke" might look like this:
(define (get-coke)
(let ((number-of-cans 5))
(cond ((> number-of-cans 0)
(set! number-of-cans (- number-of-cans 1))
(write "have a Coke"))
(else (write "sorry, out of Coke")))))
Now, "number-of-cans" is lexically-scoped within the confines of the "let"
form, so no other function can clobber it. But it doesn't work the way I
want it to, because each time I call "get-coke", my local state variable
"number-of-cans" is reset to the value 5. Waaahhh!!!
Rethinking this, I decided that what I'd really really like to do is
associate a variable with my Coke machine simulation such that:
1. It's local to the function, so nobody else can mess with it.
2. The state is saved in this variable, even after the function has been
exited, and the next time I call this function, the state variable
contains exactly what was left in it when this function was last exited.
In other words, I want the data that's specific to the function encapsulated
with the function, so that only that function can alter it. How can I do
this? We need to take advantage of lexical scoping: we can combine the
lexically-scoped variable of the "let" form with something we've only
talked about briefly: the "lambda" expression.
II. The lambda expression
Remember the lambda expression? Bryan talked about it a few lectures ago.
But as a reminder, a lambda expression is nothing more than a function body
without a name. You've been using them all along without knowing about
them, because the syntactic dialect of Scheme that we've been using hides
that fact. But when you type a function definition like this:
(define (square x)
(* x x))
that's being translated into this form:
(define square
(lambda (x)
(* x x)))
How is that translation done? With either special forms or macros (which
are like special forms that you can write), but that's stuff you don't
need to worry about this semester.
So what does that lambda in the latter form mean? It says "here's a function
body that takes one parameter, x, and multiplies it by itself and returns
that value. Compile this function body -- that is, make it into something
executable -- and bind that executable function body to the name 'square' so
that when somebody invokes the function name square on an appropriate
argument, this executable chunk of code will be executed." Whew!
In short, the "lambda" tells Scheme to compile the function body that follows,
and return the executable result. That is, lambda can be viewed as a function
that returns a function.
The name "lambda" itself comes from something called "lambda calculus", the
mathematical precursor to functional programming that was developed by
Alonzo Church. You can read more about Church in a slightly-dense but still
informative history of the mathematical march toward computers entitled
"The Advent of the Algorithm" by David Berlinski.
III. Back to the Coke machine simulation
Now that we know this stuff, what does it buy us? How can we use this
knowledge to solve our problem?
Well, the answer isn't immediately obvious, so don't be discouraged if it
didn't just leap out at you. In fact, it's sort of obscure, yet it's a
really cool idea, which is this: When lambda returns a compiled function
body, it needs to bind real values to any free variables -- variables that
aren't named in the parameter list or defined in a "let" inside the "lambda"
-- that are referenced inside that function body. So if we reference some
variable named foo, for example, inside that function body, and foo isn't in
the parameter list, then Scheme either needs to bind foo to some value that
it gets from the environment (like some let, define, or set! that happens
outside of the lambda expression) when it compiles the function body, or it
barfs. And better yet, when it does bind that variable foo to some value,
it internalizes foo in such a way that no other function can get access to
it. That is, this particular instance of the variable foo is accessible
only to the function that includes the reference to foo! That's exactly
what we want. So let's take that last approximation to "get-coke":
(define (get-coke)
(let ((number-of-cans 5))
(cond ((> number-of-cans 0)
(set! number-of-cans (- number-of-cans 1))
(write "have a Coke"))
(else (write "sorry, out of Coke")))))
and take advantage of lambda to encapsulate the variable "number-of-cans"
within the compiled function:
(define (get-coke)
(let ((number-of-cans 5))
(lambda ()
(cond ((> number-of-cans 0)
(set! number-of-cans (- number-of-cans 1))
(write "have a Coke"))
(else (write "sorry, out of Coke"))))))
Then we evaluate it, and things start to behave a little differently than
before. If I invoke (get-coke), I won't see:
> (get-coke)
"have a Coke"
Instead, I'll see this:
> (get-coke)
#
Why does that happen? Because we've redefined the "get-coke" function so
that it compiles its function body and returns that compiled function instead
of applying that function body to an argument list and returning that result.
We've turned "get-coke" into a function that returns a function.
So now we have a function body that has no name. How do we use it? There
are a couple of approaches worth looking at here. The first approach is to
use a tool that Scheme provides to us for using compiled but nameless
functions. It's called "apply", and it's the mechanism that Scheme uses to
apply a compiled lambda function to its arguments. In the case of "get-coke",
there are no arguments, so we could apply the result of invoking "get-coke"
to the empty list as follows:
> (apply (get-coke) ())
"have a Coke"
And just out of curiosity, what happens if we don't bind that "number-of-cans"
variable to some value?
(define (get-coke)
; (let ((number-of-cans 5))
(lambda ()
(cond ((> number-of-cans 0)
(set! number-of-cans (- number-of-cans 1))
(write "have a Coke"))
(else (write "sorry, out of Coke"))))) ;)
> (get-coke)
#
> (apply (get-coke) ())
reference to undefined identifier: number-of-cans
>
But I digress. We still have a problem with the working version -- every
time we call "get-coke" here, we get a new copy of the "get-coke" function,
so we never run out of Cokes. We'll see another use for apply later...it's
just not what we need right now.
In the meantime, a better way would be to give the result of "get-coke" its
own function name. Let's call it "cs-machine":
> (define cs-machine (get-coke))
> (cs-machine)
"have a Coke"
This way, I get only one copy of the function that "get-coke" returns, and if
I keep on invoking "cs-machine", I'll eventually run out of Cokes:
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"sorry, out of Coke"
>
So the "get-coke" function no longer simulates my getting a Coke out of a
Coke machine. Instead, "get-coke" now creates a simple model of a Coke
machine which in turn I can bind to a unique name. And then when I invoke
the name of that Coke machine, it behaves like a Coke machine -- it gets me
a Coke. So we really ought to change the name of the function from "get-coke"
to "make-coke-machine":
(define (make-coke-machine)
(let ((number-of-cans 5))
(lambda ()
(cond ((> number-of-cans 0)
(set! number-of-cans (- number-of-cans 1))
(print "have a Coke"))
(else (print "sorry, out of Coke"))))))
Once again, after evaluating "make-coke-machine", I can do this:
> (define cs-machine (make-coke-machine))
> cs-machine
#
>
and I'll have a simulated Coke machine with its own name, "cs-machine", and
its own internal state variable.
I can simulate the purchase of one icy can of Atlanta's own nectar of the
gods just by calling the function now named "cs-machine".
> (cs-machine)
"have a Coke"
How many cans are left? Well, there had better be four. But note that I can't
really tell, because I can't access the variable that's encapsulated within
the "cs-machine" procedure:
> number-of-cans
reference to undefined identifier: number-of-cans
>
I'm prevented from accessing that variable; think of that variable as
something that can be accessed from within that procedure, but not from
outside...it's private.
But now let's say I want to simulate a whole college campus full of Coke
machines. And I want each of those machines to act independently of each
other...buying a Coke from one shouldn't change the number of Cokes in the
another, as might happen with some of the approaches described above. Well,
that's already done. Let me reset my computer science Coke machine:
> (define cs-machine (make-coke-machine))
>
And now I'll create another machine that we'll put in electrical engineering:
> (define ee-machine (make-coke-machine))
>
Then a lot of thirsty EE majors come by and buy all the Cokes out of
ee-machine:
> (ee-machine)
"have a Coke"
> (ee-machine)
"have a Coke"
> (ee-machine)
"have a Coke"
> (ee-machine)
"have a Coke"
> (ee-machine)
"have a Coke"
> (ee-machine)
"sorry, out of Coke"
>
If ee-machine and cs-machine are truly independent, thirsty CS majors should
still have a full load of five Cokes in their machine, even though the
hoggish EE's have emptied their machine:
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"have a Coke"
> (cs-machine)
"sorry, out of Coke"
>
As a bit of a convenience (this isn't a big conceptual detail, just a frill),
I can write a more generic Coke-buying function that allows me to just pass
the name of the Coke-machine I want to buy from:
(define (get-coke machine)
(apply machine ()))
And that's what the "get-coke" function looks like now!
Then I reset a machine:
> (define cs-machine (make-coke-machine))
And I buy a Coke like this:
> (get-coke cs-machine)
"have a Coke"
>
This is pretty cool stuff, but my simulation is a little bit awkward. For
example, every time I want to reload my Coke machine, I have to redefine my
Coke machine. It would be nice if I could just send a message to my Coke
machine and tell it that I either want to buy a Coke or reload it with a
fresh supply. Here's how I can do that (and note that there might be other
ways to accomplish the same thing, and they might even be more elegant, but
this will suffice for now):
(define (make-coke-machine-v2)
(let ((number-of-cans 0))
(lambda (dowhat)
(cond ((equal? dowhat 'load)
(set! number-of-cans 20)
(display number-of-cans)
(display " frosty cans waiting for you") (newline))
((and (equal? dowhat 'buy) (> number-of-cans 0))
(set! number-of-cans (- number-of-cans 1))
(display "have a Coke") (newline)
(display "cans remaining: ")
(display number-of-cans)
(newline))
((equal? dowhat 'buy)
(display "sorry, out of Coke"))
(else (print "please use correct change"))))))
Now I can create as many instances of Coke machines as I want, and I can
send them each messages telling them how I want them to behave:
> (define cs-machine (make-coke-machine-v2))
> (cs-machine 'buy)
sorry, out of Coke
> (cs-machine 'load)
20 frosty cans waiting for you
> (cs-machine 'buy)
have a Coke
cans remaining: 19
> (cs-machine 'buy)
have a Coke
cans remaining: 18
> (cs-machine 'buy)
have a Coke
cans remaining: 17
> (cs-machine 'foo)
"please use correct change"
>
These functions that I have created with their own internal local state
variables are called "lexical closures". Sometimes these things are called
"generators". They can have multiple local variables, and you can also pass
values to them via parameters at the time they are created, and those values
will be internalized too.
A lexical closure is a great tool for simulating independent real-world
objects where supply and demand is an important issue. Examples are robots,
operating systems, grocery store checkout lines, automated teller machines,
pumps at the gas station, and yes, vending machines.
This lexical closure idea is a teeny weeny bit obscure and unwieldly, yet it
allows me to write programs that model the world as a bunch of independent
but interacting objects...and that's the kind of thing that programmers want
to do all the time: build models where the world consists of classes of
things (coke-machines, for example) and instances of classes (like cs-machine
vs. ee-machine...we could call them objects) and messages passed between
objects that tell them what to do (buy and load).
Hey, since building programs like that is pretty desirable, wouldn't it be
nice if there were another way of thinking about programming that made this
sort of stuff easier? And wouldn't it be even nicer if there were programming
languages that made writing these kinds of programs easier?
Patience grashoppers...patience.
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: November 11, 2003