I. State and persistence
As we saw previously in our discussion of state space search, the
notion of state is huge in computer science. In essence, "state" is
what computing is all about...getting from one state to the next.
Where did the process begin, where has it been, where is it now? All
the computational processes that we've generated have some sense of
their own state, and they keep track of that sense of state on the
activation stack. But the functional programming paradigm we've been
enjoying and employing puts some constraints on how we can use our
sense of state. Specifically, storing information about state on the
activation stack means that when the function that caused that
information to be stored is done, the state information must
disappear from the activation stack...that's inherent in how
activation stacks work.
What's missing from this picture is a notion of persistence...that
is, there are times when we'd like a procedure to generate a result
and have that result stick around even after the procedure has
stopped execution. When a procedure does something that persists
after the procedure stops, that's called a side effect. Programming without
side effects is one of the fundamental principles, if not the fundamental
principle, of functional programming. And while I can compute anything I
need to in the functional paradigm, there are just a whole lot of things
that are easier to deal with when we're allowed to step outside that
functional programming paradigm. Some of the things you've had to do on
homework assignments so far might have been a lot easier if we had let you
go "non-functional". Today's lecture should give you some of the tools you
go outside the functional box into what's called the "procedural" or
"imperative" programming paradigm, which allows us to write procedures that
store values away in storage areas called "variables" (just like in your
algebra classes) so that those values can be used even after the
procedure that wrote them has gone away.
This procedural style of programming isn't necessarily better or
worse than the functional style you've learned so far...it's just
different. For those of you who have previous programming experience
with procedural languages like C and Pascal, these past several weeks
might have been a bit of a mind-bender for you, and you may have
decided that the functional style just doesn't seem nearly as natural
as the procedural style. You may be right, or you may just be
suffering from a healthy dose of first-language-itis (i.e., whatever
I learned first is what I'm comfortable with, and anything different
seems weird). It's certainly the case that some of you have already
told me that you now have difficulty thinking procedurally after nine
weeks of thinking functionally.
II. Some tools for turning functional into procedural
As you know, things in the real world have state. For example, if we
want some game-playing program to learn so as to play better next
time, we'd like to be able to save stuff that doesn't go away when
the program ends. Or if we want to model physical systems like pizza
shops, hotels, cafeterias, gas stations, automated tellers, coke
machines, operating systems, or even robots, all of which exhibit
varying behavior over time as a result of their previous history, we
want to be able to retain something about history or state.
But as noted above, purely functional programming prevents us from
having persistence of state. What we're going to need to make this
whole persistence thing work are two new concepts: variables and
assignment.
We've seen variables in Scheme already, although we haven't often
referred to them as such. The parameters in your Scheme function
definitions are a form of variable---a name for a bit of memory where
you can store information (or more accurately, pointers to
information, but that distinction isn't real important at this time).
When we pass an argument to a function through a parameter,
the value of the argument is bound to or assigned to that parameter
or variable name. And when a procedure then refers to that variable
name in the right way, it can retrieve the value associated with that
name.
When we wanted to introduce additional variable names for values, we
learned to introduce helper functions to add new parameters that we
could use as counters and other such things. There are other, perhaps
less cumbersome ways of doing the same thing, and we'll learn about
some of them in a bit, but first it will be useful to talk another
important concept... the concept of scoping.
III. Variables and scoping in Scheme
The variables created in the parameter list of a function definition
are, as you may have discovered, inaccessible outside the limits of
that function definition. The values passed through those parameters
are accessible during the execution or evaluation of the body of the
function, but they cannot be accessed after the function evaluation
has ended. Furthermore (and this is a very important furthermore),
those values cannot be accessed or altered by functions which are
called by the current function.
When the bindings of variables in the argument list are accessible
only within the body of the function, this is called "lexical
scoping", and the variables are said to be "lexically scoped" or have
"lexical scoping". (This is also called "static scoping", and lots of
times it's called "local scoping".) Here's an example. If we define
the following silly function:
(define (add-3 number)
(+ 3 number))
and then do the following
> (add-3 6)
9
> number
reference to undefined identifier: number
>
we get an error when we try to refer to the parameter called "number"
after the function has stopped execution. It's a silly example, but
it works. Similarly, consider this one:
(define (add-3 number)
(add-a-constant 3))
(define (add-a-constant const)
(+ number const))
This example doesn't work. Why? Because "number" is accessible in
"add-3" but not "add-a-constant", and that's a result of lexical
scoping. Sure enough, that's the message we get when we try it now:
> (add-3 6)
reference to undefined identifier: number
>
Lexical scoping may seem a little bit restrictive. I'm sure you'll
all encounter situations where you'd like to relax these restrictions.
Other languages may default to, or at least make available, other
less-restrictive forms of scoping too. But in general, programming
language designers prefer lexical scoping over other forms in part
because it tends to be easier to implement in compilers, and in part
because it encourages programmers to create more dependable programs.
How so? Well, when you're thinking about when or where variable
bindings can be accessed, it's easier to think in terms of "they're
accessible where the variable names can be seen in the body of the
function" instead of "they're accessible in this function, even if
you don't see the variable names, so long as this function was called
by another function in which these variables were accessible." (Whew!
That's actually another variation on scoping, called "dynamic scoping".
You don't need to worry about it.) You'd like to be able to understand
and debug a function based on what you know just by looking at the
function itself. You don't want to have to know where you are in the
overall control flow of a group of interdependent functions to know
how this particular function is going to work at this point in time.
It's an issue of controlling complexity.
In addition, under lexical scoping, you can use a variable name
locally in one function, and you (or someone else) can use the same
name locally in another function, and neither will clobber the other.
Lexical scoping avoids variable name conflicts. Under less restrictive
forms of scoping, there is a possibility that this undesirable situation
could arise and, depending on who calls what and when, one value of a
variable could be inadvertently wiped out by another. This can get pretty
nasty in a communal programming environment; be sure to practice safe
programming.
IV. Introducing more local variables
Sometimes you'd like to have more local variables in your functions
than just what's available through the parameter list. One way to
introduce more variables is to use a "helping function" and add
additional variables in the argument list to the helping function.
But you already knew that.
Another way to introduce additional local variables is to use
Scheme's "let" function:
(let ((variable-1 value-1)
(variable-2 value-2)
:
:
(variable-n value-n))
:
: )
The "let" form allows you to bind a bunch of values to variables,
and makes all those variables local to the code embedded within the
"let" form. That is, those variables are lexically or locally scoped.
We'll see an example of how to use "let" later on.
V. Global scoping (ugh!)
There are times, rare though they may be, when you want to make some
variable accessible to all functions, even though that variable value
isn't passed to those functions through the parameter list. A
variable value that is accessible to all procedures in this way is
called a "global variable" or "free variable". First I'll
show you how to create one, then I'll discourage you from doing it.
To create a global variable, we use our old pal "define". You thought
that define was just for defining functions, but we can use it for
global variables.
Let's say we had a humongous database that we lots of functions
needed to access, and that we don't want to pass that database
through a parameter. We could create it like this:
(define database '((eiselt 2.6) (lerner 2.1)
(sweat 1.7) (smith 0.9)))
and access it like this
(define (getinfo name)
(assoc name database))
We pop down to the evaluator window, and we can do this:
> database
((eiselt 2.6) (lerner 2.1) (sweat 1.7) (smith 0.9))
> (getinfo 'eiselt)
(eiselt 2.6)
>
It looks cool, but it has the potential to get you into all sorts of
trouble. In short, excessive use of global variables is the signature
of lazy and probably incompetent programmers. Why are global
variables bad? First, they tie procedures to specific named data
structures. Our simple getinfo procedure in the above example works
only with a data structure called "database", so we've made sure that
it will be difficult to reuse this piece of code elsewhere. That's
not a good decision...good programmers prefer to reuse code over
rewriting code.
Second and third, global variables promote variable name conflicts
when multiple programmers are working on the same project, and they
significantly cloud the readability of the procedures that use them.
It's no longer easy to determine what external influences there might
be on the behavior of a given procedure, because those influences
aren't all declared in the parameter list. And even if you can find
all the global variables in your procedure (hint: make them stand out
like **database**), you still don't have an easy way of knowing what
procedures alter those global variables, or how, or when.
In general, the use of globals is to be avoided. In this class, the
use of global variables is prohibited unless we tell you it's ok. OK?
VI. Assignment
The fundamental method of assignment (or binding a value to some
variable name) within the body of a Scheme function is the "set!"
function:
> (set! x 3)
I lied. This isn't really a true Scheme function, because it doesn't
evaluate all its arguments. (Neither does "let" for that matter.) The
first argument is treated as if it were quoted. So instead of binding 3 to
whatever is stored in X, 3 is bound to the name X.
> (set! x 3)
> x
3
You can also use set! to change the value of a global variable:
> (set! database ())
> database
()
In either case, the variable needs to be previously defined via "let"
or "define" or through a function's parameter list before you can use
"set!" on it.
VII. The cost of assignment
On a less pragmatic note, but from a more computational and/or
theoretical perspective, assignment carries an additional cost. Under
the substitution model of evaluation, a variable name was bound to
its value at the time the function was entered, and it didn't change
during the execution of the function. Life was simple.
By introducing assignment, we've disrupted our simple substitution
model of evaluation. The values of variables depend on where you are
in the control flow within the procedure. Debugging is more
complicated. But more importantly, your simple substitution model of
evaluation is no longer powerful enough to describe what's going on.
You need a more powerful, and more complex, and no longer
mathematically nice, evaluation model, which we won't talk about here.
Furthermore, by introducing assignment, you've given up your immunity
to the siren song of global variables. Once you've crossed the line
and decided to abandon the functional programming paradigm just a
little bit, it doesn't take much to get you to abandon it just a
little bit more and drop in some global variables when you
just can't figure out a more elegant way to pass information between
procedures. And then you lose your "referential transparency"...you
no longer know that the procedure you're looking at is affected only
by the information explicitly passed to it through the argument list.
That's not conducive to a happy debugging experience.
By introducing assignment, you may be improving efficiency in terms
of cycles used, space used, and so on. You may also be improving the
"aesthetic complexity"---your program may look less complicated (but
that's not guaranteed, not by any stretch of the imagination). But
you're also demonstrably increasing computational complexity. That
makes your life harder if you happen to be a compiler or interpreter
writer, but that's no big deal, as we don't mind making computers do
work. What is a big deal is that assignment makes it harder to test,
debug, or validate your software.
A procedure in which variable bindings don't change has to be easier
to understand than one in which the bindings do change, no? And when
you're working with really big systems where big dollars or real
lives are on the line, you want to put more value on reducing
complexity than on improving efficiency. Massive software failures
don't happen because programs are too slow or they're too big...they
happen because the programs are too complex to be checked out
thoroughly.
Does this mean you shouldn't use assignment? No. What it means is
that you should realize that you're paying a price to do so.
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: November 5, 2003