CS 3411 - C++ Program

Spring Quarter, 1999
Jump directly to main sections of this document:

Overview of the assignment

For this programming assignment, you will implement an interpreter for a small procedural programming language.

Architecture of the interpreter

The basic architecture is described well by this diagram:

                                                                   V
   ----------                                ---> Pretty-printer   i
   | Source |      ---------  ----------   /                     T s
   |  Code  | ---> | Lexer |->| Parser | -----> Type-checker     r i
   |  File  |      ---------  ---------- ^ \                     e t
   ----------                            ^   ---> Interpreter    e o
                                   Parse Tree                      r
                                                                   s
The lexer and the parser are on the front end of the system. As you know, the lexer's job is to break up the source code into language tokens. The parser then uses these tokens as terminal symbols to parse the language grammar. The output of the parser is a parse tree, which serves as our internal representation for the program.

Once we have the parse tree, we can do a variety of meaningful things with the program. In the diagram above are three examples of things to do with the parse tree: pretty print the source code, do static type checking on the program, or run (interpret) the program. For this assignment, we will consider all three.

Design details

In this section we describe some details of the design of each component of the system.

Lexer

The Lexer class takes the entire source code as input to the constructor via a single string. It then provides a narrow interface to the parser for consuming tokens of the program string one by one:

The Lexer implementation keeps track of its location in the original source code (line number and column number). This information is passed on to each token, which knows its "location in the source" as well as its token type and lexeme.

Parser

The Parser is implemented as a recursive descent parser. Each production in the language grammar translates into a method in the parser; each non-terminal in the grammar corresponds to a type of node in the parse tree. The structure of each node in the parse tree is determined by the nature of that program construct. For example, class If has three main components: an Expression pointer, which points to a node in the tree representing the conditional expressions, and two Block pointers, which point to the "then" and "else" blocks of code associated with the If. (There is also a Token field named "anchor", which just hangs around to aid in error reporting--recall that tokens know what line number of the file they came from.)

The parser recognizes three "top level structures" in the program source: Programs, Procedures, and Functions. A parse tree is associated with each of these. Since one source code file can hold any number of procedure/function definitions, the parser returns a structure called a File to respresent all the code in the source code file. The File contains two Dictionaries, which allow us to lookup procedures and functions by name, and a pointer to the main program itself.

Visitor Pattern

Once we have the parse tree, there are a variety of things we may want to do with it, for example: interpret the code, pretty-print the code, or typecheck it. In order to decouple the parse tree data structure from the various actors which want to traverse it, we use the Visitor design pattern.

Each node in the parse tree inherits from an abstract class ParseTreeNode. Then, there is another abstract class called ParseTreeVisitor from which the PrettyPrinter, Typechecker, and Interpreter inherit. Each ParseTreeNode can "accept" a visitor, and each visitor knows how to visit each type of node in the parse tree. That is to say, the ParseTreeVisitor interface looks something like

   class ParseTreeVisitor {
   public:
      virtual void visit( Procedure* x )=0;
      virtual void visit( Function* x )=0;
      virtual void visit( Declaration* x )=0;
      virtual void visit( Block* x )=0;
      virtual void visit( ProcedureCall* x )=0;
      virtual void visit( Assign* x )=0;
      virtual void visit( If* x )=0;
      virtual void visit( While* x )=0;
      ...
only with a lot more methods (one for each separate type of node in the parse tree). Visitors then walk the tree by visiting each node. For example, the Interpreter, when visiting an If node, would first visit the conditional Expression in the node. If the expression evaluated to true, it would then visit the "then" Block of the If node, else it would visit the "else" Block.

TypeChecker

TypeChecker is a subclass of ParseTreeVisitor. Its job is to verify that the code meets the language's static semantics. For the most part, this means "type checking". The TypeChecker maintains a Dictionary which maps variable names to their types, and traverses the parse tree for each module, ensuring that variables are used in accordance with their declaration. For example, when the TypeChecker visits an If node, it check that the Expression is a boolean. When it visits an Assign node (assignment statement), it ensures that the type of the variables on the left-hand-side matches the type of the expression on the right-hand-side.

Procedure and function calls take a little more work to typecheck--the TypeChecker must ensure that a procedure of function with the right name exists, and looks up the type and number of its formal parameters, and check them against the actual parameters. It must also ensure that only expressions with lvalues are passed as inout or out parameters.

Interpreter

The Interpreter is another ParseTreeVisitor subclass. It takes a File parameter to its constructor and interprets the Program in that File.

Interpreting the program is as simple as traversing the parse tree for the program. The Interpreter maintains a Stack of referencing environments--one for each module on the current activation stack. The referencing enviornments are just Dictionaries which map variable names to Values. Value is an abstract class with three concrete subclasses (Bool, Int, and String) which model the values of the three types of variables in our language.

The semantics for visiting each type of node in the parse tree are intuitive. For example, visiting a Declaration causes us to add new entries in the current referencing environment. To visit an Assign statement, we visit the right-hand-side Expression to evaluate it, and then look up the left-hand-side Var in the referencing environment and set() a new Value. The result of visiting the Expression in an If determines whether we visit the "then" Block or the "else" Block.

To call a procedure or function, all the actual parameters are evaluated and stored, and then the called module is visited. When that module begins, it pushes a new referencing environment onto the stack, adds the parameters to its environment, and then runs its course.

PrettyPrinter

A pretty-printer is a tool which reformats source code by manipulating whitespace so the source code is indented "pretty". It can be implemented as yet another subclass of ParseTreeVisitor. As each node is visited, the code corresponding to the node is printed with the correct indentation and newlines.

Details about the language

A precise specification of the language would be a pain in the neck, and no fun to read. So instead, this section merely points out some highlights of the language.

To give a feel for the language, here's a small program that computes the greatest common factor of two numbers:

   function gcf( in int x, in int y ) returns int
   begin
      if( y > x ) then
         return gcf( y, x );
      else
         if( x mod y = 0 ) then
            return y;
         else
            return gcf( x mod y, y );
         endif
      endif
   endfunction
      
   program foo
      int x, y;
   begin
      print( "enter two numbers (0's to quit): " );
      read( x );
      read( y );
      while( x>0 and y>0 ) do
         print( "gcf is ", gcf(x,y) );
         print( "enter two numbers (0's to quit): " );
         read( x );
         read( y );
      endwhile
   endprogram
There is one parameter-less main program, accompanied by any number of procedures and functions. There are three kinds of data: int, string and bool. There are also three kinds of parameters: in, out, and inout. Functions may not have side-effects. Declarations for variables appear at the top of a block for each module. There are no global variables. Statements are terminated with semicolons. The two main control constructs are if and while. Recursion is supported.

There are a number of builtin operators. They are listed here, from highest to lowest precedence:

   * div mod
   + - .
   < <= > >= = <>
   not
   and
   or
The "dot" is the string concatenation operator, as in perl. All operators associate left-to-right.

In addition to those operators, there are a few named operations:

   substr( str, start, length ) -> string
   length( str ) -> int
   print( <exprlist> )
   read( <varlist> )

String literals appear in double-quotes. true and false are boolean values. Integer literals are supported. The language is case-sensitive. Comments are not supported.

The sample programs are suggestive of the grammar that describes the language, so no BNF is given.

What you need to do

We could just have you write the whole thing, but that'd be no fun, because it's big (about 2000 lines of code) and you only have a short period of time to do this project. So instead, we're giving you most of the code, and you just have to add a few features and make a few modifications. Hopefully in the process you'll learn a bit about C++ and also about programming languages in general.

Before you do anything, you should grab a copy of the code (which is located in the directory

~lorgon/Int
on lennon.cc), compile it, and run it on some of the sample test files. Then you should try to read over some of the C++ code and compare it to the architecture described above, to get a feel for what source code files map into which parts of the architecture. Once you've a basic feel for the code, you can actually get started working on the project.

First, a number of method bodies have been removed and replaced with a comment saying "WRITE THIS". You should implement those methods--until you do, many of the sample programs we've given you won't parse, typecheck, or run properly.

Second, the length operator for strings has been completely omitted from the code. To add this feature to the interpreter "from scratch", you will need to:

Whew!

At this point, the interpreter should conform to the language spec.

Next, you must implement an interesting feature of your own. What you do is up to you. Here are a few suggestions you might consider, in approximately increasing order of difficulty:

To get "full credit" you need only pick a relative easy new feature to add. Harder, niftier, time-consuming features can earn you extra credit. If you've a feature idea, and want advice about "how hard it would be" or "how much extra credit it'd be worth" ask on the newsgroup.

Where to go for help

There are, as always, a number of resources available to you. For general questions/discussion, the newsgroup is good; since all three sections of 3411 are doing this assignment, you may wish to crosspost to all three groups. TA/instructor office hours or email are available for more specific questions.

There will also be a C++ help session centered around the assignment (time and date to be announced).

Get started early! Even if you're not writing any code yet, start looking over the given code and digesting it.

Turnin information

You should turn in all of your source code and the Makefile. In addition, you should include a README file which describes (1) any problems you are aware of with your program and (2) a description of the feature you added. You should also turn in at least two sample code files (call them "mytest01" and "mytest02") which demonstrate your language feature.

Put everything into a directory, go into workon, and type

   turnin  p1  directory_name
to turn in your assignment.

The due dates vary across sections; you can always use

   turnin  --schedule
to find out when workon thinks it is due for your section. Maybe.


Last updated on Sat May 8 21:45:25 EDT 1999 by Brian McNamara