Programming Assignment 1 Lexical Analyzer Due Date: by 6:00 PM on Friday, May 1, 1998 For this assignment only, you are permitted to work in teams of not more than two people. You must register your team using the groupcon program in the cs2430 workon environment. Both members of the team must be registered by not later than 6:00 PM on Friday, April 24, 1998. If you do not register yourself as part of a team, then it will be assumed that you are working alone. No extra points will be given for people who choose to work alone. EACH MEMBER OF A PROJECT TEAM IS INDIVIDUALLY RESPONSIBLE FOR SEEING THAT A COPY OF THE ASSIGNMENT IS TURNED IN. Readings: Read the sections on Implementing a Simple Shell in the Weiss book (pp. 392-398). Stub Code: You can retrieve the baseline source for the GetToken() func- tion by doing a getjob prj1 from the CS 2430 workon environ- ment. You should receive the following files: 1README.ps this file (Postscript version) 1README.txt this file (text version) Makefile.jkg Makefile that uses jkg's options.c Makefile.opt Makefile that uses your own options.c tokmain.c source code for main() function token.c source code for GetToken() function toktest.sh a testing script The following sections describe what modifications must be made to the existing version of token.c, and what point values each set of modifications is worth. Do not modify the prototype definition for GetToken(), otherwise the remain- ing parts of the project will not fit together properly. For a "C" (70%) - 1. Add code in GetToken() to detect and report the following error conditions: a. More than one occurrence per line of the From, To, or Amper tokens. Note that the foo shell will not support the file append ( ">>" ) operation, so this is an error as well. b. A newline character encountered inside any quoted string c. More than MaxLineLen characters in a single Word of tokenized output. If a Word expands to more than MaxLineLen characters, then the remaining characters beyond MaxLineLen are discarded. 2. Add expansion of program arguments on the input line, any- where except inside singly quoted strings. A program argu- ment is denoted by a `$' sign, followed immediately by a non-negative integer (no spaces may appear between the `$' and the integer). A `$' sign inside a singly quoted string, or one that is followed by anything other than an integer, is interpreted as an ordinary `$' sign (i.e. no expansion is done). 3. Add expansion of the $# variable, inside any context in which program arguments are also expanded. If encountered, $# is replaced by a numeric string that gives the number of program arguments that were passed from the command line. For a "B" (80%) - 1. Complete all requirements for a "C". 2. Add a History Substitution feature that allows the user to recall selected parts of the most recently entered input line. History Substitution is enabled if the -H option has been SET, and is valid anywhere on the current input line, except inside singly quoted strings. The following two-character sequences must be recognized, and the indi- cated information from the previous input line must be substituted into the current input line. Seq Substitution !0 zero-eth Word !^ first Word !$ last Word If the particular word to be substituted was not present in the previous command, then the empty string must be substi- tuted instead. For an "A" (90%) - 1. Complete all requirements for a "B". 2. Add a new state to GetToken() that will recognize when you are inside a string that is delimited by double quotes (the InQuote state is used for strings delimited by single quotes - this new state will be in addition to the InQuote state). The production version uses InString, but you can use whatever identifier you want for your own program. 3. Add code to the GetToken() function to determine when a doubly-quoted string has been detected, and to handle it according to the following requirements. Strings enclosed in double quotes are subject to additional interpretation for certain "special" characters - these special characters, when preceded by a backslash (as shown below), will have the following meanings: Seq Substitution \b backspace \f form-feed \n new-line \r carriage return \t tab \v vertical tab \\ backslash \" double quote \ooo the 8-bit character whose ASCII code is the 1-, 2- or 3-digit octal number ooo, which must start with a zero. These are the same meanings that are understood by most C compilers, as well as the by the echo command on some ver- sions of UNIX (check /bin/echo on Acme for an example). Any other character not in the above list should be copied normally. For an "A+" (100%) - 1. Complete all requirements for an "A". 2. Add the following features for the History Substitution requirement: Seq Substitution !! entire Word list !* entire Word list, excluding zero-eth Word Note that for these two cases, multiple words may be sub- stituted. These must be re-tokenized and returned as indi- vidual words by GetToken(). As with the previous history functions, if the particular word to be substituted was not present in the previous input line, then the empty string must be substituted instead. For Extra Credit (110%) - 1. Complete all requirements for an "A+". 2. Modify the History Substitution feature so that a range of Word can be specified for substitution from the previous input line. The format for the range specifier is !m-n, where m and n can be non-negative integers (denoting Word numbers), or either of ^ or $. Both m and n are optional, and if omitted, default to ^ and $, respec- tively (note that as a special case, !- is the same as Minimum Functionality - The following features and requirements MUST be completed, to facilitate testing and grading of future assignments on this project: C1, C2, C3, A2