Lecture 3 -- LISP evaluation basics The substitution model of evaluation Now that you know how to create and run a LISP function, it's time to become familiar with how LISP functions are evaluated by a LISP interpreter. We won't go into all the details here; we'll leave some of the details for the weeks to come. But this little discussion should give you enough understanding to make you dangerous. Let's say you've created a file with all the LISP code for evaluating the quadratic formula that I gave you in the previous lecture notes, and that you've corrected the little detail about the number of arguments to the denominator function. Then let's say that you've also added your implementation of the function for computing the discriminant. Now you tell the LISP interpreter to evaluate all those defun functions, as you were shown in your lab session. You're ready to compute stuff. If you type (pos-root 1 0 -4) at the LISP evaluator, here's what will happen: ? (pos-root 1 0 -4) 2 ? You can follow that a little bit more closely by telling the evaluator that you want to "trace" some of the functions that are being called: ? (trace pos-root pos-numerator discriminant denominator) NIL ? Then when you type (pos-root 1 0 -4), you'll see something like this: ? (pos-root 1 0 -4) Calling (POS-ROOT 1 0 -4) Calling (POS-NUMERATOR 1 0 -4) Calling (DISCRIMINANT 1 0 -4) DISCRIMINANT returned 4 POS-NUMERATOR returned 4 Calling (DENOMINATOR 1) DENOMINATOR returned 2 POS-ROOT returned 2 2 What's really happening is something like this: When I type the list (pos-root 1 0 -4) at the interpreter, LISP assumes I want to invoke the function "pos-root". It evaluates the arguments, 1, 0, and -4, which in this case conveniently evaluate to themselves. LISP then looks up the definition of "pos-root", extracts that definition, and then applies that definition to the evaluated arguments. This can be viewed as a substitution, where: (pos-root 1 0 -4) becomes (/ (pos-numerator 1 0 -4) (denominator 1 0 -4)) This expression is sent to the evaluator again. LISP evaluates the arguments, but this time the arguments aren't numbers but are instead lists. So these expressions are handed to the evaluator, and LISP treats them as function calls. The order in which these function calls, or any arguments in this example, are evaluated won't be important. Why? Because of the functional programming constraints we've followed, there are no side-effects. For example, computing pos-numerator in no way affects the computation of denominator and vice-versa, so it doesn't matter which one is computed first. (This will be a very nice feature in the parallel-processing world of the future.) If, on the other hand, one of these functions updated some global variable that was accessed by the other, we would most definitely worry about which one was computed first, resulting in a harder programming job, more complex software, and so on. Still, something like Macintosh Common LISP has to evaluate one of these arguments before the other, and the way it works (as do most LISP systems) is to evaluate the arguments left to right. So LISP evaluates the expression (pos-numerator 1 0 -4) and does the appropriate substitution: (/ (pos-numerator 1 0 -4) (denominator 1 0 -4)) becomes (/ (+ (- 0) (discriminant 1 0 -4)) (denominator 1 0 -4)) This "substitution" process continues until LISP arrives at a solution (which requires that all the functions you defined are eventually replaced by functions that were already defined in LISP). In short, the LISP function evaluation process can be described like this: 1. Look up and retrieve the function definition. 2. Evaluate the arguments to the function (by passing the arguments themselves to the LISP evaluation function--- numbers evaluate to themselves, symbols evaluate to the objects they're bound to, while lists are treated as function calls and evaluated accordingly). 3. Apply the function definition to the evaluated arguments (i.e., replace the argument place-holders in the new function definition with the corresponding evaluated arguments). 4. Substitute the result of all that for the original function call from step 1. 5. Go back to step 1 with this new expression. Warning: The is a very sketchy and hand-wavy description of the evaluation process. It should suffice for now, but as time goes by, we'll beef this description up with details about how various types of things get evaluated. But rather than dump all the details on you at once, we'll add things incrementally as we need them. LISP's favorite data structure LISP comes with its own "abstract data types" or ADTs. As you undoubtedly remember from previous courses, an ADT is (1) the logical data structure itself (an abstraction, not the detailed implementation), combined with (2) a set of operations which work on the data structure. The two basic data types in LISP are "atoms" and "lists". An atom is a non-divisible thing, like symbols (foo, bar, +) and numbers (42, 3.15). They're not exactly interesting, nor are the operations associated with them. The more interesting data structure is the list. A list is an ordered set of atoms or lists (a recursive definition, no?). That ordered set must begin with a "(" and end with a ")", and it may contain any number of atoms or lists in any order, and therefore includes the empty list. The following are all examples of valid lists: () (a) (a b c) (a (b) c) (a (b (c))) (alex ate the green meat) (defun pos-root (a b c) (/ (pos-numerator a b c)(denominator a b c))) (Both atoms and lists are often called symbolic expressions, or even just expressions, which is why we talked about evaluating expressions occasionally in the discussion a few paragraphs above.) Simple operations on lists There are three fundamental operations on lists: two of them are used for decomposing lists and getting at their components, while the other operation is used for composing lists. The first list operator is called "first". Given an argument which is a list, "first" returns the first element of that list. For example, if the symbol A has been bound to the list (X Y Z 4), then (first A) will return X. Note that the original list will not be altered in any way; A is still bound to the list (X Y Z 4). The second list operator is called "rest". Given an argument which is a list, "rest" will return the list consisting of all the original elements of the list except the first element. Thus, assuming A is still bound to (X Y Z 4), (rest A) will return (Y Z 4). The third operator is called "cons", which you can think of as being short for "construct". Given two arguments, where the first is anything and the second is a list, "cons" returns the list that you get by inserting the first argument as the first element of the list that's the second argument. Got it? So, (cons (first A)(rest A)) returns (X Y Z 4). Let's go over it again: What does (first (X Y Z 4)) return? Time's up. If you said it returns X, then you're dead wrong. Why? Because you forgot the evaluation rules: (first A) where A is bound to the list (X Y Z 4) is not the same as (first (X Y Z 4))! In the first case, the argument A evaluates to the list (X Y Z 4), and then when the definition of "first" is applied to that, the first element of that list, X, is returned. Great. In the second case, however, LISP looks at the argument (X Y Z 4) and tries to evaluate that as a call to the function named X with the arguments Y, Z, and 4. If you don't have a previously-defined function named X, or if Y or Z aren't bound to something, you'll see LISP grind to a screeching halt. In short, anything you throw at LISP, LISP will try to evaluate. Give it a symbol and LISP will try to find what the symbol is bound to. Give it a list, and LISP will try to evaluate it as a function call... Preventing evaluation ...unless you explicitly tell LISP not to evaluate what you're giving it! How do you do that? With another function, called "quote", which is merely a function that takes an argument, doesn't evaluate it, and returns that argument. For example, (quote (X Y Z 4)) returns (X Y Z 4). So (first (quote (X Y Z 4))) returns X, not an error. The "quote" function is used so often that it gets an abbreviation: the apostrophe or single quote mark. So (quote (X Y Z 4)) is the same thing as '(X Y Z 4), and ? (first '(X Y Z 4)) X ? (rest '(X Y Z 4)) (Y Z 4) ? (cons 'X '(Y Z 4)) (X Y Z 4) ? A dangerous piece of knowledge Lots of LISP programming errors result from using "quote" when not necessary, or not using it when you should, so you might want to sit down for a while in front of your favorite LISP system and play with this stuff for awhile. And in doing this kind of practice, it might be helpful to know how to bind symbols to values using an assignment operator. So now I'm going to show you how to do that, but for now you can only use assignment to help you get a feel for quoting and evaluation and stuff like that. DO NOT USE THIS ASSIGNMENT OPERATOR IN ANYTHING YOU SUBMIT FOR GRADING UNTIL WE TELL YOU TO AS IT VIOLATES FUNCTIONAL PROGRAMMING CONSTRAINTS. Someday in the weeks ahead, we'll let you use assignment in a responsible fashion, but for now just use it in practice. Otherwise, you'll find yourself getting amazingly low grades while doing extra work. The generic assignment operator is "setf", and takes two arguments. The first argument is a symbol (e.g., a variable name), and the second argument is the thing you want the symbol bound to. The first argument is not evaluated, but the second argument is, which should tell you that "setf" is not an ordinary LISP function. The "setf" function evaluates the second argument, binds it to the first argument, and returns the evaluated second argument: ? (setf A '(X Y Z 4)) (X Y Z 4) ? (first A) X ? ? (setf A (X Y Z 4)) > Error: Unbound variable: Y > While executing: SYMBOL-VALUE > Type Command-/ to continue, Command-. to abort. > If continued: Retry getting the value of Y. See the Restarts... menu item for further choices. 1 > A variation of "setf" (actually, it's the predecessor to "setf") is "setq". They're different, but it's not important for you to know how they're different just yet. We'll deal with that later. For now, assume that they're interchangeable. Box-and-pointer notation Now you're comfortable, I hope, with the notion that (X Y Z 4) is a list, but unlike Pascal and other languages, LISP doesn't require you to worry about the pointer details. But sometimes, LISP folks like to see the pointers to help understand what's going on. So here's a more tangible (i.e., less abstract) picture of what our sample list looks like: A | | | _______ _______ _______ _______ +-->| | | | | | | | | | | /| | | | --+---->| | | --+---->| | | --+---->| | | / | |_|_|___| |_|_|___| |_|_|___| |_|_|/__| | | | | X Y Z 4 In this lovely figure, each element of the list is represented as a pair of words in memory. The first or left word contains a pointer to the symbol that "is" the first element, and the second or right word contains a pointer to the next element. It's now easy to see that, given a pointer to this list, the function "first" (e.g., (first A)) looks at the pair of words at the end of that pointer, follows the pointer stored in the left word, and returns what's at the end of that pointer. In this case, it's the symbol X. The "rest" function (e.g., (rest A)) looks at the pair of words pointed at by A, follows the pointer stored in the right word, and returns what's at the end of that pointer, which in this case is the list (Y Z 4). Note again that these operations didn't change the structure of the list, they merely followed pointers and returned what they pointed to. Each of these two-word pairs are called "cons cells", because it's what you get when you "cons" two things together. (This is where the dynamic memory allocation mechanism comes into play.) Earlier, I said that the second argument to the "cons" function should be a list, but I lied for the sake of simplicity. You can in fact have something that's not a list as the second argument to a "cons" function, but the result is something slightly weird, called a "dotted pair". For example: ? (cons 'A 'B) (A . B) ? The box-and-pointer notation for this dotted pair looks like this: _______ | | | | | | --+---->B |_|_|___| | A You used to see dotted pairs used a lot in LISP programming, but they're not used as frequently any more (except in some circumstances, which we might get to in this course). I personally interpret these as meaning that I "cons'ed" two things together that I didn't mean to. Something for nothing What's the technical difference between the list and the dotted pair? It's how they are terminated. A "well-formed" list always ends with something called "nil", which we represented in the box-and-pointer diagram up there with a big slash through the right word of the last element. A dotted pair, on the other hand, ends with something other than "nil". So what's a nil? A nil is a very special thing in LISP. First, it has the unusual property of being both an atom and a list at the same time. What kind of list? It's the empty list, represented by "()". Consequently, "nil" and "()" are the same thing. It's also the symbol meaning Boolean "false" in LISP (Boolean "true" is the predefined "T" or anything non-nil). And, as we noted just above, it's always the end of a "well-formed" list, which allows LISP to maintain the following sorts of consistencies: The first element of the empty list is, naturally, nothing, which is nil, which is the empty list: ? (first nil) NIL ? The rest of the empty list, which is just the empty list with the first element (i.e., nothing) removed, is also the empty list: ? (rest nil) NIL ? But if we try to put (first nil) and (rest nil) back together using "cons", we should get the original thing we started with, which was nil, no? No: ? (cons (first nil) (rest nil)) (NIL) ? It's a result of the unique nature of nil in LISP. Think of it as one of LISP's endearing idiosyncracies. And since ending with nil is a sign of goodness, that's all for now. Go practice. Copyright 1995 by Kurt Eiselt. All rights reserved. -- Kurt Eiselt Assistant Dean, Student Services College of Computing Georgia Institute of Technology Atlanta, GA 30332-0280