Today we started a discussion about programming languages and
why some have long lifespans while many others have very short
lifespans. But first we talked about something called...
The lexical closure
Let's take another look at the notion of process and state.
Let's say that I want to construct a function that simulates
a Coke (TM) machine. To do this, I'll need a state variable
to tell me how many cans of Coke there are in this machine.
I could do this:
(defvar number-of-cans 5)
and then my function could be this:
(defun get-coke ()
(cond ((> number-of-cans 0)
(setf number-of-cans (- number-of-cans 1))
(print "have a Coke"))
(T (print "sorry, out of Coke"))))
I can now call "get-coke" five times, and I'll see the same
message ("have a Coke"), but on the sixth call, I'll see a
new message ("sorry, out of Coke"). I now have a function
which knows its history -- it has state.
However, "number-of-cans" is a free variable -- that is, it's
initialized outside the lexical scope of the function in
which it's referenced, and I don't like that. In this case,
it's a global variable that's modifiable by all functions except
where that name is also being used as a local or lexically-
scoped variable, and that's a situation that's ripe for
trouble. I'd really like this variable to be local to just
one specific Coke machine. And I'm sure I don't want to have
to create a separate global variable for each simulated Coke
machine, especially if I'm going to simulate the behavior of
some vending machine company that may own hundreds or
thousands of individual Coke machines. Keeping track of all
that could be difficult, to say the least.
As you know, to create additional local variables beyond
those that are given in a function's argument list, I can use
LISP's "let" form. So my revised "get-coke" might look like
this:
(defun get-coke ()
(let ((number-of-cans 5))
(cond ((> number-of-cans 0)
(setf number-of-cans (- number-of-cans 1))
(print "have a Coke"))
(T (print "sorry, out of Coke")))))
Now, "number-of-cans" is lexically-scoped within the confines
of the "let" form, so no other function can clobber it. But
it doesn't work the way I want it to, because each time I
call "get-coke", my local state variable "number-of-cans" is
reset to the value 5. Waaahhh!!!
Rethinking this, I decided that what I'd really really like
to do is associate a variable with my Coke machine simulation
such that:
1. It's local to the function, so nobody else can mess
with it.
2. The state is saved in this variable, even after the
function has been exited, and the next time I call this
function, the state variable contains exactly what was
left in it when this function was last exited.
How can I do this? We need to take advantage of lexical
scoping: we can combine the lexically-scoped variable of the
"let" form with our "lambda" form that we learned about
earlier, and I'll throw in the "function" function.
I'll use pretty much the same code as above, but
this new function, which I'll call "make-coke-machine"
instead of "get-coke", doesn't have the same purpose as "get-
coke". (I'll show you what "get-coke" looks like soon.)
When I called "get-coke", I expected a simulation of
a Coke machine -- it would either give me a virtual Coke or
it wouldn't. But when I embed that same code inside a
"lambda" form, which is in turn embedded inside a "function"
form, I get a function which when called returns *another*
function, which will then behave like a Coke machine when it
is evaluated:
(defun make-coke-machine ()
(let ((number-of-cans 5))
(function (lambda ()
(cond ((> number-of-cans 0)
(setf number-of-cans
(- number-of-cans 1))
(print "have a Coke"))
(T (print "sorry, out of Coke")))))))
Now, evaluating "make-coke-machine" returns an executable
function object with a built-in lexically-scoped variable,
"number-of-cans", initially bound to the value 5. For all
intents and purposes, it is a special storage place known
only to this particular instantiation of the function.
Why does it work? When I define a lambda function and then
use "function" to return the executable function object, the
LISP system must save copies of bindings of any free
variables within the lambda function at the time that the
surrounding "function" was evaluated -- lexical scoping
demands it. Since "number-of-cans" is a free variable (again,
it's not defined in the argument list or in a "let" form
within the lambda function -- it is a global variable from
the point of view of the lambda function), LISP binds that
variable to the value 5 and internalizes that when it creates
the function object.
After evaluating "make-coke-machine", I can do this:
? (setf cs-machine (make-coke-machine))
#[COMPILED-LEXICAL-CLOSURE #x5B840E]
and I'll have a simulated Coke machine with its own name,
"cs-machine", and its own internal state variable. Now I'll
create a function which will allow me to use my simulated
Coke machine:
? (defun get-coke (machine-name)
(funcall machine-name))
GET-COKE
and that in turn will allow me to get a simulated Coke from
my simulated Coke machine in the simulated computer science
building:
? (get-coke cs-machine)
"have a Coke"
"have a Coke"
Why do we see "have a Coke" twice? I'll leave that as an
exercise for you.
There should be four more Cokes left in my machine.
? (get-coke cs-machine)
"have a Coke"
"have a Coke"
? (get-coke cs-machine)
"have a Coke"
"have a Coke"
? (get-coke cs-machine)
"have a Coke"
"have a Coke"
? (get-coke cs-machine)
"have a Coke"
"have a Coke"
? (get-coke cs-machine)
"sorry, out of Coke"
"sorry, out of Coke"
Sure enough, I could get five Cokes, but on the sixth try,
the machine told me that it was empty. At any time, I could
have created another simulated Coke machine, which would have
its own independent local state variable:
? (setf ee-machine (make-coke-machine))
#[COMPILED-LEXICAL-CLOSURE #x5BC4F6]
? (get-coke ee-machine)
"have a Coke"
"have a Coke"
I can show that the state variable in the "ee-machine"
simulator is independent of the state variable in the "cs-
machine" simulator, because even though there are Cokes in
the "ee-machine", the "cs-machine" is still empty:
? (get-coke cs-machine)
"sorry, out of Coke"
"sorry, out of Coke"
And I can show that the state variable is untouchable just by
doing this:
? number-of-cans
> Error: Unbound variable: NUMBER-OF-CANS
> While executing: SYMBOL-VALUE
> Type Command-/ to continue, Command-. to abort.
> If continued: Retry getting the value of NUMBER-OF-CANS.
See the RestartsI menu item for further choices.
1 >
These functions that I have created with their own internal
local state variables are called "lexical closures".
Sometimes these things are called "generators". They can
have multiple local variables, and you can also pass values
to them via parameters at the time they are created, and
those values will be internalized too.
A lexical closure is a great tool for simulating independent
real-world objects where supply and demand is an important
issue. Examples are robots, operating systems, grocery store
checkout lines, automated teller machines, pumps at the gas
station, and yes, vending machines.
Static vs. Dynamic Languages
Certainly one of the things we've stressed in this class so
far is the idea that software development involves making
informed choices, and that those choices carry with them both
costs and benefits. For example, we've discussed more than
once the trade-offs involved with functional versus non-
functional programming styles, even within the confines of a
single programming language. A choice between one type of
programming language and another will also have impacts on
how you approach a given problem, what form your solution
will take, and what you'll be allowed to do in implementing
that solution. The more "mainstream" languages, which can be
classified as "static languages", are based on computing
technologies from the 1950s and 1960s, and they tend to force
programmers into the following paradigm:
They tend to be batch-compiled languages, meaning you
wait for the entire program to compile and link after
every change.
Development environments are usually based on ASCII
source files and command-line interfaces (although there
are some notable exceptions recently).
Compiled programs have little or no runtime type
checking, which usually results in a complete crash when
an error occurs.
Memory is managed manually--code has direct memory
references or pointers. Code is hard to write,
understand, and can create subtle, hard-to-fix bugs.
When managed manually, memory allocation is inefficient,
buggy, not portable, and hard to integrate with other
code libraries. This is why memory-related errors are
responsible for the majority of all bugs in a typical
static language program.
C, C++, and Pascal are examples of static languages.
Another class of languages, called dynamic languages, have grown
in popularity in recent years. These languages enable a different,
more interactive approach to programming:
They allow faster development and rapid prototyping.
They usually have dynamic type information--errors are
usually detected before execution, and when they do
occur, the errors are usually recoverable and do not
cause crashes.
They usually have automatic memory management, which
hides the details of memory management from the
programmer. Code is simpler, cleaner, more reliable,
and more reusable.
They usually have a shell or "listener" so the user can
execute code interactively (either via an interpreter or
an incremental compiler).
What languages are dynamic? Dylan, Forth, Haskell, Logo, Java,
Prolog, and Smalltalk. And of course LISP and its dialects,
including Scheme. A big benefit of dynamic languages is that they
tend to be much more extensible than static languages. That is,
when the needs of the programmer change, the language can be
extended to adapt to those changes. Static languages are less
flexible in that regard. This helps to explain why LISP has
not disappeared after all these years.
Next week we'll see an example of LISP's extensibility when
we take a look at a small Scheme interpreter written in LISP.
Copyright 1998 by Kurt Eiselt. All rights reserved.
Last revised: November 24, 1998