Lecture 7 - 480/580 (compiler construction) Bottom Up parsing (intro) In Top down parsing we viewed the parsing process one way, as a stack of goals we are trying to match. Now let us go back to first principles and look at parsing from a different approach. Remember our grammar: (let us number the productions now) 1. E -> E + T 2. E -> T 3. T -> T * F 4. T -> F 5. F -> Id 6. F -> ( E ) Remember our task is to answer the question "is the input an E"? Lets use the old recursive programming trick of imagining we have already solved this problem, and see if it helps us solve this problem (you would be surprized at how often this works!). Assuming we could recognize E's, T's and F's, we could ALMOST build a finite automata for each. (put automata up on board). Notice each end of the automata ends in a state associated with some production. The problem is we have these arcs labeled with nonterminals, and we don't know how to traverse them. But wait a moment, we can connect them together; for example, in the first diagram, we do know what a T is - it is something generated by the second automata I drew. So we if are looking for a T at this point (in the E diagram), then we should really be looking for something that matches the T automata. So we can draw in these epsilon arcs (do you all remember what epsilon arcs are? arcs that don't cost anything to traverse). This gives us another huge nondeterministic finite state automata But wait a moment, it still isn't clear how we ever traverse an arc labeled with a nonterminal (go through simple example, say x + y). Thats were we need the special states we associated with each nonterminal. (so called reduction states). When we reach one of these, if we can't go forward we back up a number of states equal to the number of symbols on the right hand side of the production, then advance AS IF the next token had been the nonterminal. Go through x + y example completely. Notice how we had to be two places at once (this is an artifact of the FA being nondeterminstic). What can we do to get around that? make it deterministic. Go through deterministic construction. Review - build a FA for each nonterminal, connect them together with epsilon rules, giving a huge NDFA make a DFA out of the NDFA There are ways of doing this directly from the grammar, without going through the nondetermistic phase - this is what YACC does. then to use the DFA, we make transitions as long as we can, when we can't we hope we are in a state assicated with a production (a reduction state), in which case we reduce, backing up in the FA, then proceed again. If at the end we have seen an E, we are happy. THIS is called LR parsing.