November 21, 1995 -- Building languages on languages Static vs. Dynamic Languages Certainly one of the things we've stressed in this class so far is the idea that software development involves making informed choices, and that those choices carry with them both costs and benefits. For example, we've discussed more than once the trade-offs involved with functional versus non- functional programming styles, even within the confines of a single programming language. A choice between one type of programming language and another will also have impacts on how you approach a given problem, what form your solution will take, and what you'll be allowed to do in implementing that solution. The more "mainstream" languages, which can be classified as "static languages", are based on computing technologies from the 1950s and 1960s, and they tend to force programmers into the following paradigm: They tend to be batch-compiled languages, meaning you wait for the entire program to compile and link after every change. Development environments are usually based on ASCII source files and command-line interfaces (although there are some notable exceptions recently). Compiled programs have little or no runtime type checking, which usually results in a complete crash when an error occurs. Memory is managed manually--code has direct memory references or pointers. Code is hard to write, understand, and can create subtle, hard-to-fix bugs. When managed manually, memory allocation is inefficient, buggy, not portable, and hard to integrate with other code libraries. This is why memory-related errors are responsible for the majority of all bugs in a typical static language program. C, C++, and Pascal are examples of static languages. Dynamic languages, on the other hand, let you program in an interactive style. They allow faster development and rapid prototyping. They usually have dynamic type information--errors are usually detected before execution, and when they do occur, the errors are usually recoverable and do not cause crashes. They usually have automatic memory management, which hides the details of memory management from the programmer. Code is simpler, cleaner, more reliable, and more reusable. They usually have a shell or "listener" so the user can execute code interactively (either via an interpreter or an incremental compiler). What languages are dynamic? Dylan, Forth, Haskell, Logo, Java, Prolog, and Smalltalk. And of course Lisp and its dialects, including Scheme. Scheme One of the oft-mentioned advantages of LISP is its extensibility---it's very easy to add new things to the language, and it's also very easy to build whole new languages on top of LISP. One language that has been built on top of LISP is called "Scheme". Scheme is used primarily as a teaching language, and is close enough to LISP to be considered a dialect of LISP. In fact, it's the only dialect of LISP in current widespread use besides Common LISP. Scheme tries to offer a minimal set of very powerful features that can be used to implement other features. Here are some of the ways in which Scheme is simpler than Common LISP: 1. Fewer built-in functions and special forms. 2. No special variables, only lexical variables. (i.e., no dynamic scoping) 3. No optional parameters. 4. No special forms for looping---the programmer must use recursion, and count on Scheme to implement it efficiently. 5. Scheme evaluates the function part of a function call in exactly the same way as the arguments. Huh? What was that last one? Function calls in Scheme and LISP are different? Yes, exactly. Here's how: In LISP, (f x) means 1) look up the function body bound to f, 2) evaluate x, and 3) apply the retrieved function body to the value of x. LISP doesn't evaluate f. In Scheme, (f x) means 1) evaluate f (and we hope that will return a function body), 2) evaluate x, and 3) apply the result of evaluating f to the result of evaluating x. This is a more consistent approach to evaluating an expression than in LISP. Any expression can be in the function position (the first of the list), and it is evaluated just like any other argument. It's a subtle but important difference. The mini-Scheme interpreter A simple interpreter for Scheme is pretty easy to construct in LISP. Here's an example of an interpreter for a subset of Scheme (taken from Peter Norvig's book): (defun interp (x &optional env) (cond ((symbolp x) (get-var x env)) ((atom x) x) ((case (first x) (QUOTE (second x)) (SET! (set-var! (second x) (interp (third x) env) env)) (LAMBDA (let ((parms (second x)) (code (maybe-add 'begin (rest (rest x))))) #'(lambda (&rest args) (interp code (extend-env parms args env))))) (t ;; a procedure application (apply (interp (first x) env) (mapcar #'(lambda (v) (interp v env)) (rest x)))))))) (defun set-var! (var val env) (if (assoc var env) (setf (second (assoc var env)) val) (set-global-var! var val)) val) (defun get-var (var env) (if (assoc var env) (second (assoc var env)) (get-global-var var))) (defun set-global-var! (var val) (setf (get var 'global-val) val)) (defun get-global-var (var) (let* ((default "unbound") (val (get var 'global-val default))) (if (eq val default) (error "Unbound scheme variable: ~a" var) val))) (defun extend-env (vars vals env) (nconc (mapcar #'list vars vals) env)) (defparameter *scheme-procs* '(+ - * / = < > <= >= cons car cdr not append list read member (null? null) (eq? eq) (equal? equal) (eqv? eql) (write prin1) (display princ) (newline terpri))) (defun init-scheme-interp () (mapc #'init-scheme-proc *scheme-procs*) (set-global-var! t t) (set-global-var! nil nil)) (defun init-scheme-proc (f) (if (listp f) (set-global-var! (first f) (symbol-function (second f))) (set-global-var! f (symbol-function f)))) (defun maybe-add (op exps &optional if-nil) (cond ((null exps) if-nil) ((length=1 exps) (first exps)) (t (cons op exps)))) (defun length=1 (x) (and (consp x) (null (cdr x)))) (defun last1 (list) (first (last list))) (defun scheme () (init-scheme-interp) (loop (format t "~&==> ") (print (interp (read) nil)))) Let's go back and take a look at the basic evaluator function, called "interp": (defun interp (x &optional env) (cond ((symbolp x) (get-var x env)) ((atom x) x) ((case (first x) (QUOTE (second x)) (SET! (set-var! (second x) (interp (third x) env) env)) (LAMBDA (let ((parms (second x)) (code (maybe-add 'begin (rest (rest x))))) #'(lambda (&rest args) (interp code (extend-env parms args env))))) (t ;; a procedure application (apply (interp (first x) env) (mapcar #'(lambda (v) (interp v env)) (rest x)))))))) The interp function works on two arguments. The first is an expression simply called "x", which is the expression to be evaluated. The second is called "env" (for "environment") which is merely an association list of variable names and their bindings or values. This Scheme interpreter is ready to deal with six different cases. From top to bottom, they are: 1. If the expression is a symbol, look up its value in the environment. 2. If the expression is an atom other than a symbol, such as a number, just return it. Otherwise, the expression must be a list. 3. If the list starts with QUOTE, return the quoted expression. 4. If the list starts with SET! (same as the LISP setq), interpret the value and then set the variable to that value. 5. If the list starts with LAMBDA (no need for #' here), then build a new procedure---a closure over the current environment. 6. Otherwise, this must be a procedure application. Interpret the procedure and all the arguments, and apply the procedure value to the argument values. Show me that factorial thing one more time To get the mini-Scheme interpreter to work, load the file containing the mini-Scheme code, and then call the function named "scheme". The prompt will change from "?" to "==>": Welcome to Macintosh Common Lisp Version 2.0! ? SCHEME ? (scheme) ==> Now you can interpret a subset of Scheme code. Some Scheme code looks just like LISP code: ==> (+ 2 2) 4 ==> But not all Scheme code looks exactly like LISP code. Here's how we define a new function in Scheme: ==> (set! square (lambda (x) (* x x))) # ==> Invoking a Scheme function is pretty much the same as invoking a LISP function: ==> (square 2) 4 ==> We can extend the mini-Scheme interpreter easily. For example, if we want to add a conditional like "if", we merely add another chunk to the "case" expression in the interpreter. This new chunk says that if the expression being evaluated begins with IF, then evaluate the second part of the expression by calling "interp" recursively on that. If the value is non-nil, then call "interp" on the third part of the expression, else call "interp" on the fourth part of the expression. And in order to make that all happen, we just use the LISP version of "if": (defun interp (x &optional env) (cond ((symbolp x) (get-var x env)) ((atom x) x) ((case (first x) (QUOTE (second x)) (SET! (set-var! (second x) (interp (third x) env) env)) (LAMBDA (let ((parms (second x)) (code (maybe-add 'begin (rest (rest x))))) #'(lambda (&rest args) (interp code (extend-env parms args env))))) (IF (if (interp (second x) env) ;; < (interp (third x) env) ;; < added stuff (interp (fourth x) env))) ;; < (t ;; a procedure application (apply (interp (first x) env) (mapcar #'(lambda (v) (interp v env)) (rest x)))))))) With that addition, we can define (what else?) the factorial function: ==> (set! fact (lambda (x) (if (equal? x 0) 1 (* x (fact (- x 1)))))) # ==> (fact 5) 120 ==> The read-eval-print loop One interesting side note that all you budding young LISP hackers should be aware of is that the interactive capability of LISP derives from something called the "read-eval-print loop". The "read" function accepts and expression typed at the terminal. The "eval" function is the LISP evaluator, which evaluates the expression obtained by "read". And "print" prints the result of the evaluation. If you nest these three functions and stick them in the middle of a loop, you'll get the interactivity that we've come to associate with LISP. We can see exactly the same thing done explicitly in our mini-Scheme interpreter, except that "eval", as noted before, is called "interp" here: (defun scheme () (init-scheme-interp) (loop (format t "~&==> ") (print (interp (read) nil)))) So, now that you know this, if you can build an evaluator, an input function, and an output function, you too can create your own interpreted programming language. This will be handy to know if, say, you're stranded on a desert island with a computer but no software. Well, ok, you'd need a power source too. And maybe a big supply of Coke Classic and a truckload of Doritos. Copyright 1995 by Kurt Eiselt. All rights reserved. -- Kurt Eiselt Assistant Dean, Student Services College of Computing Georgia Institute of Technology Atlanta, GA 30332-0280 http://www.cc.gatech.edu/aimosaic/faculty/eiselt.html