Lecture 4 - 480/580 (compiler construction) Language review Recall the third step in the list of things any translator must perform was to give structure to the input. This we do by means of grammars. Let us review: Any language grammar must have an alphabet (set of tokens) - the basic building blocks of the language. rules for telling if an utterance (sentence, string) is in the language. Recall the language hierarchy: 1. Regular Expressions can be recognized by finite state automata rules are given by regular expression form (review). 2. Context Free Languages can be recognized by push down automata (basically finite state automata with a stack) rules are given by production rules (productions). new type of name - nonterminal - usually written as BNF (Backus Naur Form). put up grammar for expressions E -> E + T E -> T T -> T * F T -> F F -> Id F -> ( E ) 3. Context Sensitive Languages need the full power of Turing machines more powerful, but harder to manipulate, and we don't need the power. (put up small example). Introduce concept of BNF - basic forms, extended forms, etc. nothing - sometimes written as epsilon, sometimes not. derivation, parse - process of getting a parse tree (terms pretty much the same - depends upon how you look at it). parse tree - the thing given to you by a parse the tree gives meaning - tells us how to group things. abstract syntax tree - pruned parse tree railroad charts (transition diagrams) - simply another notation Ambiguity, operations on grammars What happens if I replace E -> T + E is the set of sentences that match an E changed? how about the parse trees? ONLY the interpretation is different. What about E -> E + E this is said to be ambiguous, because the parse tree is not determined. Would like to avoid ambiguous grammars, even so, need futher restrictions on grammars to make efficient parsers. Will outline these restrictions as we go along. ---------can stop here, otherwise if time go over the following material which will be needed later -------------------------------- Once we have a grammar, we can do various manipulations in such a way that the set of sentences (utterances) recognized is not changed. The easist type of manipulation is to add new nonterminals. For example, if we have statement := ifkw expr thenkw statement elsekw statement it does not change the grammar at all to change this to statement := ifthen elsekw statement ifthen := ifpart thenkw statement ifpart := ifkw expr We will see later reasons why we might want to make this change. For another example, our first parsing method will require us to remove left recursion. There are two forms of left recursion; one, direct recursion, we have seen already. This occurs if we have productions of the form A -> A beta where beta is anything else. The second type, indirect recursion, is more complex. A -> B beta B -> A gamma I will go over the algorithm used to remove direct recusion. That for indirect recusion is more complex, and can be found in the text (we won't need it for this course; but if you ever do need it at least now you will know where to find it). The idea is as follows. For each production A -> A beta | A gamma | delta | nu we look at it from the very left, instead of the right. The leftmost thing A matches must be something without recursion, namely a delta or nu. What comes to the right of this must be something that can come after an A. So we introduce a new symbol, A', to represent "that which can come after an A". We rewrite the rules as follows: A -> delta A' | nu A' A' -> beta A' | gamma A' | /* nothing */ Note the nothing--it is important. Wave your hands a lot to convince them that anything that could have been produced by the former can now be produced by the latter. ----------------------------------------------------------------------- Go back to syntax charts and try to give them an intuitive feel for first sets and follow sets as "the first token you can find when you are looking for an X" and "the next token you can find after you have seen an X".