CS 480: Translators

Winter Term, 2007, Professor Budd

Homework Assignment 2

Due Wendesday, January 24

Manipulation of Grammars

In this assignment I will use the convention of capital letters for nonterminals and lower case letters for terminal symbols. The start symbol for all grammars will be the head of the first production.

  1. In these first set of questions we will explore the dividing line between finite automat and context free languages.

    In the lecture on lexical analyzers we saw that it is sometimes convenient to write finite automata using the format we normally use for context free languages. An example was the following

    digit ::= 0|1|2|3|4|5|6|7|8|9
    digits ::= digit | digits digit
    Our first assertion is that any CFL that does not have recursion (either direct or indirect) is in fact just a regular expression in disguise. Illustrate this by showing the regular expression that corresponds to the following grammar:
    A ::= B C | a
    B ::= D | e
    C ::= c | cc
    D ::= ddd | ff

  2. The next assertion is that left recursion in a CFL grammar rule does not make the grammar non regular. Illustrate this by showing the regular expression that corresponds to the following grammar:
    A ::= A b | A c | d | e

  3. Right recursion in a CFL grammar rule does not make the grammar non regular. Illustrate this by showing the regular expression that corresponds to the following grammar:
    A ::= a A | b A | c | d

  4. In fact, recursion on both the right or left side does not make the grammar non regular (although it will make it ambiguous). Illustrate this by showing the regular expression that corresponds to the following grammar:
    A ::= A a | A b | c A | d A | e

  5. Show that the grammar given above is indeed ambiguous.

  6. What remains? It is only recursion that occurs within a grammar rule that yields something that is not regular. Illustrate this by explaining the difference between
    A ::= a A b | c
    and
    a*cb*
    That is, explain what inputs are accepted by one and not by the other.

  7. There are many manipulations we often must do to a grammar in order to make it suitable for one type of parser or another. In the next set of questions we will explore some of these. Note that in many cases it will be necessary to introduce new nonterminals.

    In many cases, left recursion can be replaced by right recursion, and vice versa. Illustrate this by showing an equivalent grammar that uses right recursion and accepts exactly the same set of inputs as the following:

    A ::= A c | b
  8. Another common operation is factoring, so that every alternative has a unique first symbol. Illustrate this by showing how to factor the two productions for the if statement:
    IFSTMT ::= if EXPR then STMT
    IFSTMT ::= if EXPR then STMT else STMT

  9. Do a similar factoring for the return statement.

  10. Some parsing techniques (such as recursive descent) allow us to extend the normal BNF rules. Extended BNF rules are sometimes termed EBNF. A common EBNF extension is looping, which matches zero or more occurrences of a pattern. Loops are written using curly braces (although the syntax conventions for EBNF are not as fixed as they are for BNF, so sometimes you will also see parenthesis followed by a star, as in a regular expression. The lexical conventions section in Appendix A of our book uses this, for example.)

    Show that any rule that uses a loop can be replaced by a rule that uses only standard BNF (although it may require adding a new nonterminal). Do this by rewriting the following grammar:

    A ::= a { b } c

  11. Show how recursion can be replaced by a loop. Do this by rewriting the following grammar to use an EBNF loop.
    A ::= B c
    B ::= B b | d

  12. Another common EBNF extension is ``one or more'', which is a variation on the zero or more loop. Often this is written using a plus sign following the curly brace. Show how this, too, can be replaced by conventional BNF. Do this by rewriting the following grammar:
    A ::= a { b | d }+ c

  13. Yet another common EBNF extension is the optional part. This is often written using square brackets. Rewrite the following EBNF grammar in normal BNF format:
    A ::= a [ b | d ] c

  14. The following questions refer to the grammar for our programming assignments. This can be found at grammar.html.

    Compute the first sets for each nonterminal.

  15. Compute the follow set for the nonterminal program.
    In the programming assignment you will write a procedure to recognize the nonterminal program. This loop can either run as long as we see something in the first set for declaration
    	while current token is in first set for declaration
    declaration()
    check for semicolon
  16. or it can use the follow set for program
    	while current token is not in follow set for program
    declaration()
    check for semicolon
    Which of these two possibilities will produce a smaller condition in the if statement?

  17. In the nonterminal classBody there is a loop that matches a sequence of nonClassDeclaration symbols. In recognzing this loop you can either use the first set of nonClassDeclaration or the follow set for the loop (that is, the first symbol that must occur after the loop). Which will lead to a shorter condition to test?

  18. Does our grammar allow nested functions? Can you change one production in the grammar to either allow or disallow nested functions?

  19. Does our grammar allow classes to be defined inside of functions? Can you change one production in the grammar to either allow or disallow this feature?

  20. One difference between a class and a record is that a class allows function declarations (that is, methods) while a record does not. Is there a one-production change we can make in our grammar to convert our classes into records?

  21. Pointer dereferencing is done postfix style, not prefix style as in C. Show that if we rewrote the grammar to use prefix style pointer deref it would then be ambiguous:
    reference ::= identifier
    reference ::= reference [ expression ]
    reference ::= * reference