CS 1321X - Lecture 18 - October 21, 2003

CS 1321X - Lecture 18

State and Persistence


I. State and persistence

As we saw previously in our discussion of state space search, the 
notion of state is huge in computer science. In essence, "state" is 
what computing is all about...getting from one state to the next. 
Where did the process begin, where has it been, where is it now? All 
the computational processes that we've generated have some sense of 
their own state, and they keep track of that sense of state on the 
activation stack. But the functional programming paradigm we've been 
enjoying and employing puts some constraints on how we can use our 
sense of state.  Specifically, storing information about state on the 
activation stack means that when the function that caused that 
information to be stored is done, the state information must 
disappear from the activation stack...that's inherent in how 
activation stacks work.

What's missing from this picture is a notion of persistence...that
is, there are times when we'd like a procedure to generate a result
and have that result stick around even after the procedure has 
stopped execution. When a procedure does something that persists 
after the procedure stops, that's called a side effect. Programming without 
side effects is one of the fundamental principles, if not the fundamental 
principle, of functional programming. And while I can compute anything I 
need to in the functional paradigm, there are just a whole lot of things 
that are easier to deal with when we're allowed to step outside that 
functional programming paradigm.  Some of the things you've had to do on 
homework assignments so far might have been a lot easier if we had let you 
go "non-functional". Today's lecture should give you some of the tools you 
go outside the functional box into what's called the "procedural" or 
"imperative" programming paradigm, which allows us to write procedures that 
store values away in storage areas called "variables" (just like in your 
algebra classes) so that those values can be used even after the 
procedure that wrote them has gone away.

This procedural style of programming isn't necessarily better or
worse than the functional style you've learned so far...it's just
different. For those of you who have previous programming experience 
with procedural languages like C and Pascal, these past several weeks 
might have been a bit of a mind-bender for you, and you may have 
decided that the functional style just doesn't seem nearly as natural 
as the procedural style. You may be right, or you may just be 
suffering from a healthy dose of first-language-itis (i.e., whatever 
I learned first is what I'm comfortable with, and anything different 
seems weird). It's certainly the case that some of you have already 
told me that you now have difficulty thinking procedurally after nine 
weeks of thinking functionally.


II. Some tools for turning functional into procedural

As you know, things in the real world have state. For example, if we
want some game-playing program to learn so as to play better next
time, we'd like to be able to save stuff that doesn't go away when 
the program ends. Or if we want to model physical systems like pizza 
shops, hotels, cafeterias, gas stations, automated tellers, coke 
machines, operating systems, or even robots, all of which exhibit 
varying behavior over time as a result of their previous history, we 
want to be able to retain something about history or state.

But as noted above, purely functional programming prevents us from 
having persistence of state. What we're going to need to make this 
whole persistence thing work are two new concepts: variables and 
assignment.

We've seen variables in Scheme already, although we haven't often
referred to them as such. The parameters in your Scheme function
definitions are a form of variable---a name for a bit of memory where 
you can store information (or more accurately, pointers to 
information, but that distinction isn't real important at this time). 
When we pass an argument to a function through a parameter,
the value of the argument is bound to or assigned to that parameter
or variable name. And when a procedure then refers to that variable 
name in the right way, it can retrieve the value associated with that 
name.

When we wanted to introduce additional variable names for values, we 
learned to introduce helper functions to add new parameters that we 
could use as counters and other such things. There are other, perhaps 
less cumbersome ways of doing the same thing, and we'll learn about 
some of them in a bit, but first it will be useful to talk another 
important concept... the concept of scoping.


III. Variables and scoping in Scheme

The variables created in the parameter list of a function definition
are, as you may have discovered, inaccessible outside the limits of 
that function definition. The values passed through those parameters 
are accessible during the execution or evaluation of the body of the 
function, but they cannot be accessed after the function evaluation 
has ended. Furthermore (and this is a very important furthermore), 
those values cannot be accessed or altered by functions which are 
called by the current function.

When the bindings of variables in the argument list are accessible
only within the body of the function, this is called "lexical 
scoping", and the variables are said to be "lexically scoped" or have 
"lexical scoping". (This is also called "static scoping", and lots of 
times it's called "local scoping".)  Here's an example. If we define 
the following silly function:

(define (add-3 number)
     (+ 3 number))

and then do the following

>   (add-3 6)
9
>   number
reference to undefined identifier: number
>

we get an error when we try to refer to the parameter called "number" 
after the function has stopped execution. It's a silly example, but 
it works. Similarly, consider this one:

(define (add-3 number)
      (add-a-constant 3))

(define (add-a-constant const)
     (+ number const))

This example doesn't work. Why? Because "number" is accessible in
"add-3" but not "add-a-constant", and that's a result of lexical
scoping. Sure enough, that's the message we get when we try it now:

>   (add-3 6)
reference to undefined identifier: number
>

Lexical scoping may seem a little bit restrictive. I'm sure you'll
all encounter situations where you'd like to relax these restrictions.  
Other languages may default to, or at least make available, other 
less-restrictive forms of scoping too. But in general, programming
language designers prefer lexical scoping over other forms in part
because it tends to be easier to implement in compilers, and in part 
because it encourages programmers to create more dependable programs. 
How so? Well, when you're thinking about when or where variable 
bindings can be accessed, it's easier to think in terms of "they're 
accessible where the variable names can be seen in the body of the 
function" instead of "they're accessible in this function, even if 
you don't see the variable names, so long as this function was called 
by another function in which these variables were accessible." (Whew! 
That's actually another variation on scoping, called "dynamic scoping". 
You don't need to worry about it.) You'd like to be able to understand 
and debug a function based on what you know just by looking at the 
function itself. You don't want to have to know where you are in the 
overall control flow of a group of interdependent functions to know 
how this particular function is going to work at this point in time. 
It's an issue of controlling complexity.

In addition, under lexical scoping, you can use a variable name
locally in one function, and you (or someone else) can use the same
name locally in another function, and neither will clobber the other. 
Lexical scoping avoids variable name conflicts. Under less restrictive 
forms of scoping, there is a possibility that this undesirable situation 
could arise and, depending on who calls what and when, one value of a 
variable could be inadvertently wiped out by another. This can get pretty 
nasty in a communal programming environment; be sure to practice safe 
programming.


IV. Introducing more local variables

Sometimes you'd like to have more local variables in your functions
than just what's available through the parameter list. One way to
introduce more variables is to use a "helping function" and add 
additional variables in the argument list to the helping function. 
But you already knew that.

Another way to introduce additional local variables is to use
Scheme's "let" function:


(let ((variable-1 value-1)
      (variable-2 value-2)
              :
              :
      (variable-n value-n))
              :
              :               )



The "let" form allows you to bind a bunch of values to variables,
and makes all those variables local to the code embedded within the
"let" form. That is, those variables are lexically or locally scoped. 
We'll see an example of how to use "let" later on.


V. Global scoping (ugh!)

There are times, rare though they may be, when you want to make some 
variable accessible to all functions, even though that variable value 
isn't passed to those functions through the parameter list. A 
variable value that is accessible to all procedures in this way is 
called a "global variable" or "free variable". First I'll
show you how to create one, then I'll discourage you from doing it.

To create a global variable, we use our old pal "define". You thought
that define was just for defining functions, but we can use it for
global variables.

Let's say we had a humongous database that we lots of functions
needed to access, and that we don't want to pass that database
through a parameter. We could create it like this:

(define database '((eiselt 2.6) (lerner 2.1)
                   (sweat 1.7) (smith 0.9)))

and access it like this

(define (getinfo name)
   (assoc name database))

We pop down to the evaluator window, and we can do this:

> database
((eiselt 2.6) (lerner 2.1) (sweat 1.7) (smith 0.9))
> (getinfo 'eiselt)
(eiselt 2.6)
>

It looks cool, but it has the potential to get you into all sorts of
trouble. In short, excessive use of global variables is the signature
of lazy and probably incompetent programmers. Why are global 
variables bad? First, they tie procedures to specific named data 
structures. Our simple getinfo procedure in the above example works 
only with a data structure called "database", so we've made sure that 
it will be difficult to reuse this piece of code elsewhere. That's 
not a good decision...good programmers prefer to reuse code over 
rewriting code.

Second and third, global variables promote variable name conflicts
when multiple programmers are working on the same project, and they 
significantly cloud the readability of the procedures that use them. 
It's no longer easy to determine what external influences there might 
be on the behavior of a given procedure, because those influences 
aren't all declared in the parameter list. And even if you can find 
all the global variables in your procedure (hint: make them stand out 
like **database**), you still don't have an easy way of knowing what 
procedures alter those global variables, or how, or when.

In general, the use of globals is to be avoided. In this class, the
use of global variables is prohibited unless we tell you it's ok. OK?


VI. Assignment

The fundamental method of assignment (or binding a value to some
variable name) within the body of a Scheme function is the "set!"
function:


> (set! x 3)

I lied. This isn't really a true Scheme function, because it doesn't
evaluate all its arguments. (Neither does "let" for that matter.) The
first argument is treated as if it were quoted. So instead of binding 3 to
whatever is stored in X, 3 is bound to the name X.

> (set! x 3)  
>  x
3

You can also use set! to change the value of a global variable:

> (set! database ())
> database
()

In either case, the variable needs to be previously defined via "let"
or "define" or through a function's parameter list before you can use
"set!" on it.


VII. The cost of assignment

On a less pragmatic note, but from a more computational and/or 
theoretical perspective, assignment carries an additional cost. Under 
the substitution model of evaluation, a variable name was bound to 
its value at the time the function was entered, and it didn't change 
during the execution of the function.  Life was simple.

By introducing assignment, we've disrupted our simple substitution
model of evaluation. The values of variables depend on where you are 
in the control flow within the procedure. Debugging is more 
complicated. But more importantly, your simple substitution model of 
evaluation is no longer powerful enough to describe what's going on. 
You need a more powerful, and more complex, and no longer 
mathematically nice, evaluation model, which we won't talk about here.

Furthermore, by introducing assignment, you've given up your immunity 
to the siren song of global variables. Once you've crossed the line 
and decided to abandon the functional programming paradigm just a 
little bit, it doesn't take much to get you to abandon it just a 
little bit more and drop in some global variables when you
just can't figure out a more elegant way to pass information between
procedures. And then you lose your "referential transparency"...you
no longer know that the procedure you're looking at is affected only 
by the information explicitly passed to it through the argument list. 
That's not conducive to a happy debugging experience.

By introducing assignment, you may be improving efficiency in terms 
of cycles used, space used, and so on. You may also be improving the 
"aesthetic complexity"---your program may look less complicated (but 
that's not guaranteed, not by any stretch of the imagination). But 
you're also demonstrably increasing computational complexity. That 
makes your life harder if you happen to be a compiler or interpreter 
writer, but that's no big deal, as we don't mind making computers do 
work. What is a big deal is that assignment makes it harder to test, 
debug, or validate your software.

A procedure in which variable bindings don't change has to be easier 
to understand than one in which the bindings do change, no? And when 
you're working with really big systems where big dollars or real 
lives are on the line, you want to put more value on reducing 
complexity than on improving efficiency. Massive software failures 
don't happen because programs are too slow or they're too big...they 
happen because the programs are too complex to be checked out 
thoroughly.

Does this mean you shouldn't use assignment? No. What it means is
that you should realize that you're paying a price to do so.



Copyright (c) 2003 by Kurt Eiselt.  All rights reserved, with 
the exception of stuff that belongs to somebody else.

Last revised: November 5, 2003