IUP Computer Science
CO 310 Spring 1993

Assignment #4
(Due   5 April 1993)


    The syntax analysis (parsing) task of a compiler is one place where a stack is frequently used.  Parsing determines whether or not a statement in a programming language is properly formed.  Assignment #4 involves the writing of a simplified shift-reduce parser to recognize a single statement in a language.  The statement to recognize is either an assignment or a single-alternative IF.  The grammar for the language is given below:
STMT ->  id  asgn  EXP             Assignment:  var = expression
STMT ->  ift  EXP  thent  STMT     IF expression THEN statement
EXP  ->  id                        Expression:  variable
EXP  ->  EXP  op  id               Expr:  Expr  operator  var
With this grammar, each of the following would be a valid statement in the language.
     X = Y
     X = X + Y
     X = A + B - C * D + W / Y / X
     if A < B then X = X + 1
     if A > B and F then if X = Y then M = K - J
    Your parsing program must work as follows.  First, it must read in a parse table.  This is a two dimensional array of single characters.  Each row of the parse table is indexed by a top-of- stack symbol; each column of the parse table is indexed by a token symbol.  Both stack symbols and token symbols are of an enumerated data type:

type
   symbol = (id, op, asgn, ift, thent, exp, stmt, mark, eos);
   table = array [symbol, symbol] of char;
var
   parsetable:  table;

    Next, your program must read in (one by one) a sequence of tokens (all on one input line) that represents the language statement you are trying to parse.  Tokens are generic representations of each element of a statement.   The previous statements would be represented by the following token sequences.
id asgn id eos
id asgn id op id eos
id asgn id op id op id op id op id op id op id eos
ift id op id thent id asgn id op id eos
ift id op id op id thent ift id op id thent id asgn id op id eos

    The last token in each sequence of tokens is the EOS (End Of Statement) token.  To parse the sequence, your program must refer to the parse table each time a token is read in or a parsing action is performed to determine what to do next.  The combination of the symbol on top of the stack and the token read in determines the parse table entry; for example, if the top of stack is ID and the token is OP, look at PARSETABLE[ID,OP] which is a 1.  The table entry indicates what is to be done with the stack.  Note:  So that there is always something to look up, the stack must start parsing with a MARK symbol on it.  Following is a description of what action to take for each possible parse table entry.

E Error - Stop the parsing and report an error. Discard all tokens until an EOS token is read.

S Shift - Push the token onto the stack and get another token.

1 Reduce simple expression.  If top of stack is ID, replace it with EXP.  Then, push the token onto the stack and get another token, unless the original token was an EOS.

2 Reduce other expression.  If top three entries of stack are ID OP EXP, replace them with EXP, discard the token, and get another token. Otherwise, report an error and discard all tokens until an EOS token is read.

3 Reduce statement.  If top three entries of stack are EXP ASGN ID, replace them with STMT and check completion of parsing.  Otherwise, report an error and discard tokens until a EOS token is read.  Do not get another token.

4 Reduce IF.  If top four entries of stack are STMT THENT EXP IFT, replace them with STMT and check for completion of parsing.  Otherwise, report an error and discard tokens until an EOS token is read.  Do not get another token.

Note:  Parsing is complete if the following conditions are met at the same time:  Top of the stack is the STMT symbol; the stack contains only the MARK and STMT symbols; and the input token is EOS.

The data for the assignment is in a TEXT file called DATA.P4 in the P: or PROJ: directory. This file has information in the following form:  First, 9 records of 9 characters each - a record holds the entries for one row of the parsetable.  The table entries are followed by several records, each a collection of tokens representing one potential statement in the language.

Your program should display the input token and stack contents each time a parsing action is performed on the potential language statement.  For example, for the token sequence
 id asgn id op id eos    as in   X = Y + Z

the program output should look like the following.  (You may omit the display of the MARK symbol - it is always there.)
Token  Stack (top at right)
  ID    MARK
ASGN    MARK   ID
  ID    MARK   ID ASGN
  OP    MARK   ID ASGN   ID
  ID    MARK   ID ASGN  EXP   OP
 EOS    MARK   ID ASGN  EXP
Successful parse.
As an example of an unsuccessful parse, consider the following token sequence:
 ift id op thent id eos    as in  IF X > THEN Y
It should produce output like this:
Token  Stack (top at right)
  IFT    MARK
   ID    MARK   IFT
   OP    MARK   IFT   ID
THENT    MARK   IFT  EXP   OP
Error:  bad input token; remainder ignored.
 
The stack may be implemented using an array or a linked list, your choice.  Whatever you choose, it is very important that you treat the stack like the abstract data type that it is at all times.  ALL actions on the stack MUST be consistent with the actions allowable on a stack - push, pop, checktop, etc.  DO NOT let your main program take advantage of the implementation by doing "un-stack-like" actions.

Hand in a commented compilation listing of your program and the captured output generated by running your program on DATA.P4.   Your program MUST be written, compiled and run on the VMS system.   Turbo Pascal does not provide for the reading and writing of enumerated data types; VMS Pascal does.  Note:  Enumerated data values may be read in uppercase or lowercase; however, VMS Pascal always outputs them in uppercase.  If enumerated data values were not used, the program for this assignment would be considerably more difficult.