In this assignment I will use the convention of capital letters for
nonterminals and lower case letters for terminal symbols. The
start symbol for all grammars will be the head of the
first production.
- In these first set of questions we will explore the dividing line
between finite automat and context free languages.
In the lecture on lexical analyzers we saw that it is sometimes
convenient to write finite automata using the format we normally
use for context free languages. An example was the following
digit ::= 0|1|2|3|4|5|6|7|8|9
digits ::= digit | digits digit
Our first assertion is that any CFL that does not have recursion
(either direct or indirect) is in fact just a regular expression
in disguise. Illustrate this by showing the regular expression
that corresponds to the following grammar:
A ::= B C | a
B ::= D | e
C ::= c | cc
D ::= ddd | ff
- The next assertion is that left recursion in a CFL grammar rule
does not make the grammar non regular. Illustrate this by showing
the regular expression that corresponds to the following grammar:
A ::= A b | A c | d | e
- Right recursion in a CFL grammar rule does not make the
grammar non regular. Illustrate this by showing the
regular expression that corresponds to the following grammar:
A ::= a A | b A | c | d
- In fact, recursion on both the right or left side does
not make the grammar non regular (although it will make it
ambiguous). Illustrate this by showing the regular expression
that corresponds to the following grammar:
A ::= A a | A b | c A | d A | e
- Show that the grammar given above is indeed ambiguous.
- What remains? It is only recursion that occurs within
a grammar rule that yields something that is not regular.
Illustrate this by explaining the difference between
A ::= a A b | c
and
a*cb*
That is, explain what inputs are accepted by one and not by
the other.
- There are many manipulations we often must do to a grammar
in order to make it suitable for one type of parser or another.
In the next set of questions we will explore some of these.
Note that in many cases it will be necessary to introduce new
nonterminals.
In many cases, left recursion can be replaced by right
recursion,
and vice versa. Illustrate this by showing an equivalent grammar
that uses right recursion and accepts exactly the same set of
inputs as the following:
A ::= A c | b
- Another common operation is factoring, so that every alternative
has a unique first symbol. Illustrate this by showing how to
factor the two productions for the if statement:
IFSTMT ::= if EXPR then STMT
IFSTMT ::= if EXPR then STMT else STMT
- Do a similar factoring for the return statement.
- Some parsing techniques (such as recursive descent) allow us to
extend the normal BNF rules. Extended BNF rules are sometimes
termed EBNF. A common EBNF extension is looping, which matches
zero or more occurrences of a pattern. Loops are
written using curly braces (although the syntax conventions for
EBNF are not as fixed as they are for BNF, so sometimes you will
also see parenthesis followed by a star, as in a regular expression.
The lexical conventions section in Appendix A of our book uses this,
for example.)
Show that any rule that uses a loop can be replaced by a rule
that
uses only standard BNF (although it may require adding a new
nonterminal). Do this by rewriting the following grammar:
A ::= a { b } c
- Show how recursion can be replaced by a loop. Do this by
rewriting
the following grammar to use an EBNF loop.
A ::= B c
B ::= B b | d
- Another common EBNF extension is ``one or more'', which is
a variation on the zero or more loop. Often this is written
using a plus sign following the curly brace. Show how this, too,
can be replaced by conventional BNF. Do this by rewriting the
following grammar:
A ::= a { b | d }+ c
- Yet another common EBNF extension is the optional part.
This is often written using square brackets. Rewrite the
following EBNF grammar in normal BNF format:
A ::= a [ b | d ] c
- The following questions refer to the grammar for our programming
assignments. This can be found at grammar.html.
Compute the first sets for each nonterminal.
- Compute the follow set for the nonterminal program.
In the programming assignment you will write a procedure to
recognize the nonterminal program. This loop can either
run as long as we see something in the first set for declaration
while current token is in first set for declaration
declaration()
check for semicolon
- or it can use the follow set for program
while current token is not in follow set for program
declaration()
check for semicolon
Which of these two possibilities will produce a smaller condition in
the if statement?
- In the nonterminal classBody there is a loop that matches
a sequence of nonClassDeclaration symbols. In recognzing
this loop you can either use the first set of nonClassDeclaration
or the follow set for the loop (that is, the first symbol that must
occur
after the loop). Which will lead to a shorter condition to test?
- Does our grammar allow nested functions?
Can you change one production in the grammar to either allow or
disallow
nested functions?
- Does our grammar allow classes to be defined inside of functions?
Can you change one production in the grammar to either allow or
disallow
this feature?
- One difference between a class and a record is
that
a class allows function declarations (that is, methods) while a record
does not. Is there a one-production change we can make in our grammar
to convert our classes into records?
- Pointer dereferencing is done postfix style, not prefix style as
in C. Show that if we rewrote the grammar to use prefix style pointer
deref it would then be ambiguous:
reference ::= identifier
reference ::= reference [ expression ]
reference ::= * reference