CS 480: Translators

Winter Term, 2011, Professor Budd

Programming Assignment 2

Due Wednesday, January 19

Recursive Descent Parsing

In this part of the project you are to develop a recursive descent parser that recognizes programs written in our sample grammar.

The task of parsing is performed by an instance of the class Parser. The method parse in this class is the only public function. This constructor takes as argument a lexical analyzer (from assignment 1) a boolean debug flag, and stores both as local variables. The method parse either returns, if the parse is successful, or throws a value of type ParseException if the parse is unsuccessful.

class Parser {
private Lexer lex;
private boolean debug;

public Parser (Lexer l, boolean d) { lex = l; debug = d;}

public void parse () throws ParseException {
lex.nextLex(); // get first token
program(); // match nonterminal ``program''
if (lex.tokenCategory() != lex.endOfInput)
throw new ParseException(3); // expect end of file
}

private final void start (String n) {
if (debug) System.out.println("start " + n + " token: " + lex.tokenText());
}

private final void stop (String n) {
if (debug) System.out.println("recognized " + n + " token: " + lex.tokenText());
}

private void program () throws ParseExeception {
start("program");
...
stop("program");
}
...
}

Since we are using recursive descent as the parsing technique, your first task is to take the language grammar, and rewrite it to make it LL(1).

A skeleton (the code shown above) and the execution harness as well as several test cases are found in the handout area (http://classes.engr.oregonstate.edu/eecs/winter2011/cs480/Handouts/Asgn2) A working version of the assignment can be accessed using the handing script with the assignment name P2Test. When you are finished, you should hand in only the file Parser.java.

Making Recognition Rules

To create the recursive descent parser, each nonterminal in the grammar should be turned into a method. The method is charged with recognizing instances of the rule. Since the grammar is LL(1), it should be possible to determine whether or not the nonterminal is appropriate by simply examining the first token. Each recognition method must maintain the following invariants:

  1. When the rule is invoked, the current token is the first symbol that represents a possible match.
  2. Each method is committed to recognizing its given structure. Any failures to match should cause a ParseException to be raised.
  3. When the method associated with a nonterminal returns, not only has a valid occurrence of the nonterminal been recognized, but the current token is the first symbol that follows the text of the matched rule.
  4. Every rule matches the longest possible legal sequence of tokens.
  5. A call is made to the debugging routine start as the first statement in every routine, and to the debugging routine stop as the last statement. In both cases the argument is the name of the nonterminal; In the assignment you hand in this should be the only output you produce.

The following gives examples of the various types of nonterminals, and the type of code they generate. I will use upper case letters for nonterminals, and lower case bold letters for terminal symbols.

Matching Tokens

Suppose we have a grammar rule that begins with a token, such as the following:

A ::= b c B d

For keywords or other fixed tokens, the lexical routine match can be used to determine if the match is exact. Note that match does not advance the lexer, therefore it is necessary to call nextLex() after the successful call. If a match is unsuccessful then an exception should be thrown.

private void A () throws ParseException {
start("A");
if (! lex.match("b"))
throw new ParseException ("expecting terminal b");
lex.nextLex(); // advance over the b
if (! lex.match("c"))
throw new ParseException("expecting terminal c");
// ... continue matching
stop("A");
}

For tokens that cannot be checked by match, use one of the other forms. The lexer has the procedure isIdentifier, for example, or the lexical category can be tested to see if it is an integer or string.

Matching Nonterminal

Nonterminals are matched by simply calling their associated recognition routine. For example, suppose we have the following rule:

B ::= C d E

A recognition routine would be as follows:

private void B () throws ParseException {
start("B");
C(); // recognize the C part
if (! lex.match("d"))
throw new ParseException("expecting d symbol");
lex.nextLex(); // advance over the d
E();
stop("B");
}

Matching Loops

Repeated constructs are written as loops, and are recognized in one of two ways, depending upon which is more convenient. Consider the following construct:

F ::= { G H } k

The first technique uses the first set of the construct being looped over. Suppose the first set for nonterminal G is the terminal symbols i and j. We could write this loop as follows,

private void F ( ) throws ParseException {
start("F");
while (lex.match("i") || lex.match("j")) {
G();
H();
}
if (! lex.match("k"))
throw new ParseException ("expecting a k");
lex.nextLex(); // eat the k
stop("F");
}

The alternative is to use the follow set for the loop. In the situation above it is clear that the follow set for the repeated portions is the nonterminal k. Thus, the code could more easily be written as follows:

private void F ( ) throws ParseException {
start("F");
while (! lex.match("k")) {
G();
H();
}
lex.nextLex(); // eat the k
stop("F");
}

Notice that the condition test has been inverted. For a grammar to be LL(1), the first set of a repeated construct must not intersect in any fashion the follow set for the matching nonterminal. In the recognizer you develop sometimes it will be more convenient to use the first set approach (because the set is easier to characterize) and other times the follow set approach.

Matching Alternatives

When you have an alternative choice among several possibilities, you must identify the first sets of each alternative. A conditional statement is then used to select among the possibilities. For example, assume you have the following rule:

L ::= m N

| p Q

| r S

This could be recognized by the following:

private void L ( ) throws ParseException {
start("L");
if (lex.match("m")) {
lex.nextLex(); // eat the m
N();
}
else if (lex.match("p")) {
lex.nextLex();
Q();
}
else if (lex.match("r")) {
lex.nextLex();
S();
}
else
throw new ParseException("expecting m p or r");
stop("L");
}

Note that if the choices begin with a nonterminal, you must first compute the first sets for the nonterminals, and it must be true that these first sets are distinct. For example, suppose that instead the grammar was as follows:

L ::= M N

| P q
M ::= m m

| n m
P ::= p q

The recognizer for L would be changed to the following:

private void L () throws ParseException {
start("L");
if (lex.match("m") || lex.match("n")) {
M();
N();
}
else if (lex.match("p")) {
P();
if (! lex.match("q"))
throw new ParseException("expecting a q");
lex.nextLex();
}
else
throw new ParseException("expecting a m, n or q");
stop("L");
}

Matching Nothing

If an empty string is one of a set of possible matches, then the procedure should simply return if the first symbol is not in the first set of any other alternative. For example, if the grammar rule in the previous example were changed to the following:

L ::= m N

| p Q

| ε
The recognizer would be as follows:

private void L ( ) throws ParseException {
start("L");
if (lex.match("m")) {
lex.nextLex();
N();
}
else if (lex.match("p")) {
lex.nextLex();
Q();
}
// else successfully match nothing
stop("L");
}

Changes to the Grammar

In order to make the grammar LL(1), we need to make the following changes to the grammr:

Error Messages

Rather than using strings for error messages, the class ParseException takes a number as argument. (My only reason for doing this is so the text you produce will exactly match the text that I produce). Examine the file ParseException.java for the number/error message correspondance.

Grading

The grading program will match your output letter for letter against mine. You should try to produce exactly the same output. Do not create any new nonterminals.

Hints