November 2, 1995 - The Big Transition Up to now, we've followed a very pure functional approach to programming: no assignment no iteration (of the traditional count and loop kind) no side effects reliance on referential transparency (i.e., the result of a function is dependent only on the parameters passed to it) But hey, it sure would be nice to be able to have a function whose behavior is dependent on its history---that is, a function that has a sense of state. As we've seen, things in the real world have state. For example, if we wanted our hexapawn programs to learn so as to play better next time, we'd like to be able to save stuff that doesn't go away when the program ends. Or if we want to model physical systems like pizza shops, hotels, cafeterias, gas stations, automated tellers, coke machines, or even operating systems, all of which exhibit varying behavior over time, we want to be able to retain something about history or state. But purely functional programming severely hampers our ability to do this. What we're going to need to make this whole state thing work are variables and assignment. We've seen variables in LISP already, although we haven't really referred to them as such. Those are the symbols in the argument list of a function, but we don't currently have much control over them. Today we'll learn how to create them, control them, and use them. Variables and scoping in LISP The variables created in the argument list of a function definition are, as you know, local to the function, and they are bound to values when the function is called. Those values are accessible during the execution/evaluation of the body of the function, but they cannot be accessed after the function evaluation has ceased. Furthermore, those values cannot be accessed or altered by functions which are called by the current function. When the bindings of variables in the argument list are accessible only within the body of the function, this is called "lexical scoping", and the variables are said to be "lexically scoped". Here's an example: (defun my-first (my-list) (car my-list)) It's a silly example, but it works. Compare this to: (defun my-first (my-list) (my-car)) (defun my-car () (car my-list)) This example doesn't work. Why? Because "my-list" is accessible in "my-first" but not "my-car", and that's a result of lexical scoping. If these variables were "dynamically scoped", the second example would work just fine. Under dynamic scoping, the variables in the argument list are bound when the function is called, and the values are accessible by this function and any function called by this function (and so on) until the evaluation of this function has ceased. In other words, the variable values are accessible to any functions called by this function even though the values aren't passed to the called functions as parameters. You can think of a dynamically scoped variable as being local to the function in which that variable is first bound, but global to all functions called by that first function (and all functions called by those functions, and so on...). Dynamic scoping was employed in most earlier dialects of LISP, but lexical scoping is the default in Common LISP. Lexical scoping is preferred over dynamic scoping for two reasons: 1. When you're thinking about when or where variable bindings can be accessed, it's easier to think in terms of "they're accessible where the variable names can be seen in the body of the function" instead of "they're accessible in this function, even if you don't see the variable names, so long as this function was called by another function in which these variables were accessible." (Whew!) You'd like to be able to understand and debug a function based on what you know just by looking at the function itself. You don't want to have to know where you are in the overall control flow of a group of interdependent functions to know how this particular function is going to work at this point in time. Again, it's an issue of controlling complexity. 2. Under lexical scoping, you can use a variable name locally in one function, and you (or someone else) can use the same name locally in another function, and neither will clobber the other. Lexical scoping avoids variable name conflicts. Under dynamic scoping, there is a possibility that this undesirable situation could arise and, depending on who calls what and when, one value of a variable could be inadvertently wiped out by another. This can get pretty nasty in a communal programming environment; be sure to practice safe programming. We can get dynamic scoping in Common LISP if we really want to. We do this by declaring the variables as "special": (defun my-first (list) (declare (special list)) (my-car ())) (defun my-car () (declare (special list)) (car list)) The first declaration tells Common LISP "set up this variable as a special dynamically-scoped variable when you execute my- first". The second declaration tells Common LISP that "'list' is a dynamically-scoped variable---it's not created in the 'normal' way in this function, but go ahead and access its value in this function". Any function that uses a dynamically-scoped variable has to also contain the special declaration. Assignment Now that we have variables, let's see how to do stuff to them using assignment. The fundamental method of assignment in LISP is the "set" function: (set X 3) But this doesn't do what you probably expect it to do, because like most good LISP functions, it evaluates its arguments. If X is bound to another symbol, like "A:, the net result would be the equivalent of A <- 3. If X is not bound, however, LISP will give you an error message. What you usually want to do is this: (set (quote X) 3) which is the equivalent of X <- 3. This is needed so often that there's a shorthand version of it, called "setq", which is short for "set quote": ? (setq X 3) ;; setq binds a value to a symbol 3 ? X 3 There's another function that does the same thing, plus a bit more. It's called "setf": ? (setf X 3) 3 ? X 3 SETF and the generalized variable We tend to think of a variable as a place where you store a data object. In reality, a variable in LISP is a place where you store a pointer to a data object. A "generalized variable" in Common LISP is any expression that allows us to access a particular place in memory. So, for example, when LISP evaluates: (setf X '(A B C D)) the resulting construct in memory looks like this: X | | | _______ _______ _______ _______ +-->| | | | | | | | | | | /| | | | --+---->| | | --+---->| | | --+---->| | | / | |_|_|___| |_|_|___| |_|_|___| |_|_|/__| | | | | A B C D Assignment in LISP is nothing more than replacing one pointer with another (or initializing a pointer). So X designates a pointer, and (nth 2 X) designates a different pointer: X (nth 2 X) | | | | | _______ _______ | _______ _______ +-->| | | | | | +->| | | | | /| | | | --+---->| | | --+---->| | | --+---->| | | / | |_|_|___| |_|_|___| |_|_|___| |_|_|/__| | | | | A B C D Both X and (nth 2 X) are place descriptions; they're both generalized variables. Consequently, if we have LISP evaluate the expression: ? (setf (nth 2 X) 'Q) the list above will be changed to: X (nth 2 X) | | | | | _______ _______ | _______ _______ +-->| | | | | | +->| | | | | /| | | | --+---->| | | --+---->| | | --+---->| | | / | |_|_|___| |_|_|___| |_|_|___| |_|_|/__| | | | | A B Q D and if we type the following at our LISP evaluator: ? X we'll see LISP return: (A B Q D) If we tried this with the "setq" function, it wouldn't work, and that's the difference between "setq" and "setf". The "setf" function goes into the actual place in memory that the pointer is pointing to and replaces the value there. This is what's known as "destructive modification", and it's not usually what you want to do. Why? It's dangerous, especially in a communal programming environment. Say for example that you have a system in which information is shared between functions via global variables. There are times when this is the right thing to do. To indicate globals in LISP, we typically use a function called "defvar", and we put asterisks around the variable name just to make it stand out: ? (defvar *X* '(A B C D)) *X* ?*X* (A B C D) But then you might set up another global variable like this: ? (defvar *Y* (rest (rest *X*))) *Y* ? *Y* What happens if some other unsuspecting soul comes by and, thinking that *X* and *Y* are independent, decides to change *Y*? ? (setf (first *Y*) 'Q) Q ? *Y* (Q D) This looks like what we wanted to happen, but now when we go back and look at X, we find that it's changed, and we may not have wanted that: ? *X* (A B Q D) In situtations like this, it would be safer to make a copy of the data object before messing with it. LISP has functions called "copy-list" and "copy-tree" for doing exactly this, so make sure you look them up in the book by Steele. The obvious next question is, if destructive modification is inherently unsafe, especially in communal programming environments, why do it at all? The answer is that copying data structures, especially if they're big, eats up space and time. While one of the really nice things about a dynamic programming language like LISP is that you typically don't worry about memory management, the fact is that the more copying of data structures that you do, the more memory gets used up. That'll get in your way sometimes. Furthermore, as that same memory becomes unused, the space will have to be recovered by a "garbage collector" and returned to "available memory". When the garbage collector is running, that also eats up time, and that's not desirable either. So, people will use destructive modifications for the sake of efficiency, and when those situations occur where saving computational resources is critical, you should think about using destructive modifications too. But of course, you won't encounter these situations in CS 2360, ok? Other destructive assignment functions that you may see, especially if you look at LISP code in dialects that came before Common LISP, are "rplaca" and "rplacd". The former is shorthand for "replace car" and is equivalent to: (setf (car ...) ...) or (setf (first ...) ...) The latter is shorthand for "replace cdr" and is equivalent to: (setf (cdr ...) ...) or (setf (rest ...) ...) Hence, "setf" in Common LISP combines the power of "setq", "rplaca", and "rplacd". The cost of assignment On a less pragmatic note, but from a more computational/theoretical perspective, assignment carries an additional cost. Under the substitution model of evaluation, a variable name was bound to its value at the time the function was entered, and it didn't change during the execution of the function. Life was simple. By introducing assignment, we've disrupted our simple model of evaluation. The values of variables depend on where you are in the control flow within the function. Debugging is more complicated. But more importantly, your simple model of evaluation is no longer powerful enough to describe what's going on. You need a more powerful, and more complex, and no longer mathematically nice, evaluation model, which we won't talk about here. By introducing assignment, you may be improving efficiency in terms of cycles used, space used, and so on. You may also be improving the "aesthetic complexity"---your program may look less complicated. But you're also demonstrably increasing computational complexity. That makes your life harder if you happen to be a compiler or interpreter writer, but that's no big deal, as we don't mind making computers do work. What is a big deal is that assignment makes it harder to test, debug, or validate your software. A procedure in which variable bindings don't change has to be easier to understand than one in which the binding do change, no? And when you're working with really big systems where big dollars, and more frequently, lives, are on the line, you want to put more value on reducing complexity than on improving efficiency. Massive software failures don't happen because programs are slow, they happen because the programs are too complex to be checked out thoroughly. Does this mean you shouldn't use assignment? No. What it means is that you should realize that you're paying a price to do so. Introducing more local variables Sometimes you'd like to have more local variables in your functions than just what's available through the argument list. One way to introduce more variables is to use a "helping function" and add additional variables in the argument list to the helping function. But you already knew that. Another way to introduce additional local variables is to use LISP's "let" function: (let ((variable1 value1) (variable2 value2) : : (variablen valuen)) ) The "let" form allows you to bind a bunch of values to variables, and makes all those variables local to the code embedded within the "let" form. We'll see an example of how to use it in the next set of notes. Copyright 1995 by Kurt Eiselt. All rights reserved. -- Kurt Eiselt Assistant Dean, Student Services College of Computing Georgia Institute of Technology Atlanta, GA 30332-0280 http://www.cc.gatech.edu/aimosaic/faculty/eiselt.html