IUP Computer Science
COSC 310    Fall 2007
 


Project #4
Generative Grammars
(Due  26 October 2007)

A grammar defines the form and content of a language.  This project is to make a random sentence generator for a supplied grammar.  That is, given a grammar and a random number generator, the program should choose the productions of the grammar to generate a sentence (a statement in the language).  To illustrate, consider the following grammar which is contained in the file  sports.txt

<sport result>=<rem> <first> <verb> <second> in a <result>
<rem>=Unexpectedly,|As usual,|To the surprise of all present,|As predicted,|
        I told you so;
<first>=the Crimson Hawks|our struggling team|Appalachian State, of all teams,|
        the Mountain Lions|the blue team
<verb>=really <good>|sadly <bad>
<good>=pounced on|trounced|tore into|squeaked by|stomped
<bad>=looked like fools against|embarrased themselves vs|failed to impress|
        didn't show up on the field with|were no match for
<second>=the red team|Pitt|Slippery Rock|the Crimson Tide|lowly Xavier
<result>=71-14 laugher.|13-12 nail-biter.|31-7 romp.|51-20 thrashing.|
        game for the books.|meaningless contest.

The purpose of this grammar is to generate a sentence that you might find in a sports story.  The elements in angle brackets are called "non-terminals" and the first one of these is the "start symbol".  Each non-terminal can be replaced by using the productions that appear to the right of the equal sign (=).  The various productions are separated by vertical bars (|).  In the file containing the grammar, each non-terminal and its productions appear on a single line.  For the sake of printing this project description, some productions are moved to a second line and indented because they would not fit on a line.  The parts of a production that are not non-terminals are called "terminals"; these stay the same through the generation.

To generate a sentence, begin with the start symbol and expand it (replace it) with the production to the right, making a "sentential form".  Then begin to replace the non-terminals in the sentential form one at a time from left to right.  Replace the leftmost non-terminal with the right side of one of its productions, chosen randomly.  (Choose a random number to pick the production.)  This results in a new sentential form.  Repeat this process until all non-terminals are eliminated from the sentential form.  You then have a sentence, a statement in the language.  Here is an example of using the grammar to generate a sentence; the sentential form is shown in bold after each replacement/expansion.  A sentence produced in this way is said to have a leftmost derivation.

<sport result> is replaced with
    <rem> <first> <verb> <second> in a <result>
<rem> is then replaced with one of its productions (chosen randomly),
        e.g. "As predicted" giving
As predicted, <first> <verb> <second> in a <result>
    Then, <first> is replaced with one of its productions,
    e.g. "the blue team" giving
As predicted, the blue team <verb> <second> in a <result>
    Then, <verb> is replaced by either of its productions,
    e.g. "really <good>" giving
As predicted, the blue team really <good> <second> in a <result>
    Then, <good> is replaced by one of its productions,
    e.g. "tore into" giving
As predicted, the blue team really tore into <second> in a <result>
    Then, second is replaced by one of its productions,
    e.g. "lowly Xavier" giving
As predicted, the blue team tore into lowly Xavier in a <result>
    Finally, <result> is replaced by one of it productions,
    e.g., "51-20 thrashing." and we have formed a complete sentence
As predicted, the blue team really tore into lowly Xavier in a 51-20 thrashing.

Your program is required to prompt for the name of a file containing a grammar, to read in the grammar and to store it using linked lists.  It is recommended that the sentential form also be kept as a linked list.  The program must then generate a random sentence and prompt the person at the keyboard to ask if s/he wants another random sentence.  If the person indicates another is desired, the program must generate another sentence.  This prompt and display cycle must continue until the person at the keyboard indicates that no more sentence are desired.  Four grammar files are provided for your use: sports.txt    (about a sports result), ms.txt  (about an announcement from Microsoft),  promise.txt   (about a promise of soon-to-be-available software), and   bizarre.txt   (all-over-the-place sentences about college).  All four files can be found on the I:\ drive  in I:\jlwolfe\310\f07 You can also create your own grammar; it really isn't that difficult.

You should use the Random class to make the random numbers.  This is a very easy class to use; you make a random object with     Random   choice = new Random();       To randomly choose a production, let count be the number of productions for a non-terminal.  Then, a statement like   int pick = choice.nextInt(count);   will give pick a random integer value between 0 and count-1  So, just use pick to select the production and substitute into the sentential form.  You should expect the grammars to generate many different sentences, although some parts will appear to be selected often; this is to be expected with random numbers.  Even the sports.txt  grammar can generate 7500 different sentences.  Once you get this program working, you will probably find at least bizarre.txt somewhat funny in what sentences it produces.

When the program is finished, have it generate a sample (3-5 sentences) for each grammar.  Capture this output to hand in.  Hand in a printout of your well-documented program, and a printout of the captured output.  You must create a folder named p4 under the folder named after you on the P: drive for COSC 310.  Copy into p4 all .java files that you created for this project.