Lecture 5 - 480/580 (compiler construction) Top down parsing Let us start by taking the grammar we used last lecture for expressions, but reverse the rules for E and T (question: does this change the language accepted by the grammar? the parse trees?) E -> T + E E -> T T -> F * T T -> F F -> id F -> ( E ) Suppose we have a routine nexttoken() that would return the next token, PLUS routines currentPosition() which would return the current input position, and resetPositon(i) which would set the input back to position i. (We talked about how this might be done with a push back buffer). Using these, we can construct a parser. Our goal is to recognize an E, so we start by writing a procedure which will return true if the input is an E, and false if it is not. What can be an E? well, it must be something that matches one of the two rules for E, namely E -> T + E E -> T So we simply try each of them in turn. If the first fails, we try the second. In order to back up, we need to reset things to the point they were when we tried the first alternative. Our code could look something like the following: boolean procedure E(); boolean success; /* set true if we are successful */ integer startingPosition; /* where we were at the start */ startingPosition := currentPosition(); success := false; if T() then if match(PLUS) then if E() then success := true; if not success then begin /* try next alternative */ resetPosition(startingPosition); if T() then success := true; return success; end; What happens if we had used the original grammar? This technique will work for ANY context free language that does not have left recursion. One of the most general parsing techniques. What problems does it have? it backs up a lot. This is called backtracking, and causes lots of problems. For example, error messages, running time. We would like a way to avoid backtracking. A parser that does not use backtracking is known as a predictive parser. Why did we need backtracking? When faced with a choice, we didn't know, a priori, which one to select. Suppose we rewrote our grammar so that we did know this. Time for a definition, A grammar is said to be LL(1) if, when faced with a choice of several alternatives for the current goal, we can decide using only the NEXT token, which alternative to take. (there are several various forms of the definition, they all amount to the same). Is our grammar LL(1)? no. How can we make it that way? Suggestion 1. Rewrite recursion to use continuations. E -> T E' E' -> | + E T -> F T' T' -> | * T F -> id | ( E ) Now, the only choices are for E', T' and F. The first two include epsilon productions (productions which match nothing) and the latter doesn't. The two cause different actions. procedure E() if T() then if E'() then return true; error end; procedure E'() if match(PLUS) then begin if T then if E' then return true; error(); end; end; procedure F() if match(ID) then return true; if match('(') then begin if E() then if match(')') then return true; error(); end; error(); end; go over different forms. go over optimizations. This technique is called Recursive Descent. Many people regard it as the method of choice if you are writing a parser by hand. It is easy to write directly from the grammar, you can have reasonably good error messages. Problems - the programs are BIG. We can get around the size problem by using a table driven technique. Parser needs 1. a stack of goals 2. a table indexed by (goal, token), containing productions go through procedure (page 187). There exist programs that automatically construct this table for you from the grammar.