IUP Computer Science
COSC 310    Spring 2008

Project #4
Generative Grammars
(Due 21 March 2008)

A grammar defines the form and content of a language.  This project is to make a random statement generator for a supplied grammar.  That is, given a grammar and a random number generator, the program should choose the productions of the grammar to generate a statement (a sequence of words  in the language).  To illustrate, consider the following grammar which is contained in the file  sports.txt

<sport-result>=<rem> <first> <verb> <second> in a <result>
<rem>=Unexpectedly,|As usual,|To the surprise of all present,|As predicted,|...
<first>=the Crimson Hawks|our struggling team|Appalachian State, of all teams,|...
<verb>=really <good>|sadly <bad>
<good>=pounced on|trounced|tore into|squeaked by|stomped|annihilated
<bad>=looked like fools against|embarrased themselves vs|failed to impress|...
<second>=the red team|Pitt|Slippery Rock|the Crimson Tide|lowly Xavier|...
<result>=71-14 laugher.|13-12 nail-biter.|31-7 romp.|51-20 thrashing.|...

The purpose of this grammar is to generate a statement that you might find in a sports story.  The elements in angle brackets are called "non-terminals" and the first one of these, <sport-result>, is the "start symbol".  Each non-terminal can be replaced by using one of the productions that appear to the right of the equal sign (=).  The various productions are separated by vertical bars (|).  In the file containing the grammar, each non-terminal and its productions appear on a single line.  For the sake of printing this project description, some productions have been omitted from some production lists.  The parts of a production that are not non-terminals are called "terminals"; these stay unchanged through the statement generation. Note that a grammar begins with a start-symbol followed by a production list followed by one or more productions all on one line.  Every other non-terminal has one line in the file showing the non-terminal and its production list.  A production list contains one or more productions, initially separated by vertical bars.  Each production contains one or more terminals or non-terminals, initially separated by spaces.  Each terminal is a single word (sequence of characters with no space); each non-terminal is a single word enclosed in angle brackets.

To generate a statement, begin with the start symbol and expand it (replace it) with the production to the right, making a "sentential form".  Then begin to replace the non-terminals in the sentential form one at a time from left to right.  Replace the leftmost non-terminal with the right side of one of its productions, chosen randomly.  (Choose a random number to pick the production.)  This results in a new sentential form.  Repeat this process until all non-terminals are eliminated from the sentential form.  You then have a statement, a "sentence" in the language.  Here is an example of using the grammar to generate a statement; the sentential form is shown in bold after each replacement/expansion.  A statement produced in this way is said to have a leftmost derivation.

<sport-result> is replaced with
    <rem> <first> <verb> <second> in a <result>
<rem> is then replaced with one of its productions (chosen randomly),
        e.g. "As predicted" giving
As predicted, <first> <verb> <second> in a <result>
    Then, <first> is replaced with one of its productions,
    e.g. "our struggling team" giving
As predicted, our struggling team <verb> <second> in a <result>
    Then, <verb> is replaced by either of its productions,
    e.g. "really <good>" giving
As predicted, our struggling team really <good> <second> in a <result>
    Then, <good> is replaced by one of its productions,
    e.g. "tore into" giving
As predicted, our struggling team really tore into <second> in a <result>
    Then, second is replaced by one of its productions,
    e.g. "lowly Xavier" giving
As predicted, our struggling team tore into lowly Xavier in a <result>
    Finally, <result> is replaced by one of it productions,
    e.g., "31-7 romp." and we have formed a complete sentence
As predicted, our struggling team really tore into lowly Xavier in a 31-7 romp.

Your program is required to prompt for the name of a file containing a grammar, to read in the grammar and to store it using linked listsThat is, each production list must be a linked list and each production within each list must be a linked list.  It is also required that the sentential form be kept as a linked list.  The program must then generate a random statement and prompt the person at the keyboard to ask if s/he wants another random statement.  If the person indicates another is desired, the program must generate another statement.  This prompt and display cycle must continue until the person at the keyboard indicates that no more statements are desired.  Three grammar files are provided for your use: sports.txt    (about a sports result), insult.txt (a very simple grammar for making Shakespearean insults), and   bizarre.txt   (all-over-the-place sentences about college).  All three files can be found on the I:\ drive  in I:\jlwolfe\310\s08 You can also create your own grammar; it really isn't that difficult. The linked lists that you make must contain terminals or non-terminals or productions or productions lists.  Linked lists that contain just Strings in which each string contains whole productions or whole productions lists are NOT acceptable.

You should use the Random class to make the random numbers.  This is a very easy class to use; you make a random object with     Random   choice = new Random();       To randomly choose a production, let count be the number of productions for a non-terminal.  Then, a statement like   int pick = choice.nextInt(count);   will give pick a random integer value between 0 and count-1  So, just use pick to select the production and substitute into the sentential form.  You should expect the grammars to generate many different statements, although some parts will appear to be selected often; this is to be expected with random numbers.  Even the sports.txt  grammar can generate more than 7500 different sentences.  Once you get this program working, you will probably find bizarre.txt and insult.txt somewhat funny in what statements they produce.

When the program is finished, have it generate 3-5 statements for each grammar.  Capture this output to hand in. Hand in a printout of your well-documented program, and a printout of the captured output.  You must create a folder named p4 under the folder named after you on the P: drive for COSC 310.  Copy into p4 all .java files that you created for this project.