IUP Computer Science
CO 310 Spring 1993
Assignment #4
(Due 5 April 1993)
The syntax analysis (parsing) task of a compiler is one
place where a stack is frequently used. Parsing determines
whether or not a statement in a programming language is properly
formed. Assignment #4 involves the writing of a simplified
shift-reduce parser to recognize a single statement in a
language. The statement to recognize is either an assignment or
a single-alternative IF. The grammar for the language is given
below:
STMT -> id asgn EXP Assignment: var = expression
STMT -> ift EXP thent STMT IF expression THEN statement
EXP -> id Expression: variable
EXP -> EXP op id Expr: Expr operator var
With this grammar, each of the following would be a valid
statement in the language.
X = Y
X = X + Y
X = A + B - C * D + W / Y / X
if A < B then X = X + 1
if A > B and F then if X = Y then M = K - J
Your parsing program must work as follows. First, it must
read in a parse table. This is a two dimensional array of single
characters. Each row of the parse table is indexed by a top-of-
stack symbol; each column of the parse table is indexed by a
token symbol. Both stack symbols and token symbols are of an
enumerated data type:
type
symbol = (id, op, asgn, ift, thent, exp, stmt, mark, eos);
table = array [symbol, symbol] of char;
var
parsetable: table;
Next, your program must read in (one by one) a sequence of
tokens (all on one input line) that represents the language
statement you are trying to parse. Tokens are generic representations of each element of a statement.
The previous statements would be represented by the following token sequences.
id asgn id eos
id asgn id op id eos
id asgn id op id op id op id op id op id op id eos
ift id op id thent id asgn id op id eos
ift id op id op id thent ift id op id thent id asgn id op id eos
The last token in each sequence of tokens is the EOS (End Of
Statement) token. To parse the sequence, your program must refer
to the parse table each time a token is read in or a parsing
action is performed to determine what to do next. The combination of the symbol on top of the stack and the token read in
determines the parse table entry; for example, if the top of
stack is ID and the token is OP, look at PARSETABLE[ID,OP] which
is a 1. The table entry indicates what is to be done with the
stack. Note: So that there is always something to look up, the
stack must start parsing with a MARK symbol on it. Following is
a description of what action to take for each possible parse
table entry.
E Error - Stop the parsing and report an error. Discard all tokens until an EOS token is read.
S Shift - Push the token onto the stack and get another token.
1 Reduce simple expression. If top of stack is ID,
replace it with EXP. Then, push the token onto
the stack and get another token, unless the
original token was an EOS.
2 Reduce other expression. If top three entries of
stack are ID OP EXP, replace them with EXP,
discard the token, and get another token.
Otherwise, report an error and discard all
tokens until an EOS token is read.
3 Reduce statement. If top three entries of stack
are EXP ASGN ID, replace them with STMT and check
completion of parsing. Otherwise, report an
error and discard tokens until a EOS token is
read. Do not get another token.
4 Reduce IF. If top four entries of stack are
STMT THENT EXP IFT, replace them with STMT and
check for completion of parsing. Otherwise,
report an error and discard tokens until an EOS
token is read. Do not get another token.
Note: Parsing is complete if the following conditions are met at
the same time: Top of the stack is the STMT symbol; the stack
contains only the MARK and STMT symbols; and the input token is
EOS.
The data for the assignment is in a TEXT file called DATA.P4
in the P: or PROJ: directory. This file has information in the
following form: First, 9 records of 9 characters each - a record
holds the entries for one row of the parsetable. The table
entries are followed by several records, each a collection of
tokens representing one potential statement in the language.
Your program should display the input token and stack
contents each time a parsing action is performed on the potential
language statement. For example, for the token sequence
id asgn id op id eos as in X = Y + Z
the program output should look like the following. (You may omit
the display of the MARK symbol - it is always there.)
Token Stack (top at right)
ID MARK
ASGN MARK ID
ID MARK ID ASGN
OP MARK ID ASGN ID
ID MARK ID ASGN EXP OP
EOS MARK ID ASGN EXP
Successful parse.
As an example of an unsuccessful parse, consider the following
token sequence:
ift id op thent id eos as in IF X > THEN Y
It should produce output like this:
Token Stack (top at right)
IFT MARK
ID MARK IFT
OP MARK IFT ID
THENT MARK IFT EXP OP
Error: bad input token; remainder ignored.
The stack may be implemented using an array or a linked
list, your choice. Whatever you choose, it is very important
that you treat the stack like the abstract data type that it is
at all times. ALL actions on the stack MUST be consistent with
the actions allowable on a stack - push, pop, checktop, etc. DO
NOT let your main program take advantage of the implementation by
doing "un-stack-like" actions.
Hand in a commented compilation listing of your program and
the captured output generated by running your program on DATA.P4.
Your program MUST be written, compiled and run on the VMS system.
Turbo Pascal does not provide for the reading and writing of
enumerated data types; VMS Pascal does. Note: Enumerated data
values may be read in uppercase or lowercase; however, VMS Pascal
always outputs them in uppercase. If enumerated data values were
not used, the program for this assignment would be considerably
more difficult.