Lecture 8 - 480/580 (compiler construction) Bottom Up parsing (part 2) Remember from last time the steps involved in bottom up parsing: * Build a small NDFA for each nonterminal - something that matches occurrences of that nonterminal, but which can have arcs labelled with nonterminals. Associate the final states with the production they came from. * For each arc labelled with a nonterminal in the above, add an epsilon arc to the FA that recognizes that nonterminal; the result is a huge NDFA. * Make a DFA out of this NDFA; keeping the association of states with productions. Once we have this huge DFA, what do we do with it? Well, we build a parser using the following algorithm: Push the start state on to the top of the stack. repeat : Let S be the state on the top of the stack, and let x be the next terminal symbol. Do one of these cases: 1. If there is an arc leading out of state s labelled x, push the new state s' that we would go to along this arc onto the stack and read the next token. 2. If we are in a state associated with the end of a production, pop the stack the same number of times as there are symbols on the right hand side of the production, then advance as if the input token had been the nonterminal associated with that production. (that is, put the new state s' that we would go to along the arc labelled with the nonterminal). If we cannot do either of these, give an error. until the stack has one entry which is the start symbol and the input is exhausted. (Note: this is the same algorithm as is shown on page 219, but there in table form. This table form is just a convenient representation of FA's. We will go over this table representation later, right now it is more important that you get the idea where it all came from). These two choices are given names. The first is called a "shift" action, since the input gets shifted over one token. The second is called a "reduce" action, since we reduce part of what we have seen already to a single production. What happens if we can do both (1) and (2)? This is called a "shift-reduce conflict." It can also happen that we can have two productions associated with a single state, and this is called a "reduce-reduce" conflict. The answer depends upon what type of LR parsing you are doing, and there are many varieties: SLR, LR(0), LR(1), LALR, and so on. I will talk more about this in the next lecture, right now let us assume you always select shift over reduction. The first thing to note is that the NDFA's we started from all led to final states associated with some production. This property is preserved when we converted to the DFA. This means that if we have not seen an error yet, what we have seen could potentially be part of a legal parse tree. This property is given a name: the input is said to represent a "viable prefix". All this means is that if we see the right things next the input could be legal (but of course the parser doen't know what is coming up next, so it still could turn out that there is an error later on). We can go further: A "handle" for a production is that part of the input that matches the right hand side of the production. We can also assert that if we have not seen an error yet, then we are looking at something that could potentially be a handle for some production. One can show (we won't) that if handles can be recognzied deterministically (without backtracking) then there is some flavor of LR parsing that can do it. That is, LR parsers are the most powerful set of non-backtracking parser that can be built. Now I want to go back to the question of how we use these parsers to do anything useful. Remember I said the purpose of a parser was in theory to build a parse tree; although in practice a parse tree is much too big so we build something a bit smaller, called an abstract syntax tree. In the last lecture I demonstrated how LR parsers recognize the parse tree in pieces, from the bottom up. Now I want to show you how we can use this fact to tie action (called semantic actions) in with recognition. The actions I will use in my examples build the abstract syntax tree (something you will do for assignment 4), but any actions are legal. First let me introduce the notion of attributes. An attribute is simply a value, associated with a terminal or a nonterminal symbol. We have already seen how attributes are associated with terminals, when we set the value of yylval in your lex program. We can also associate an attribute (that is, a value) with nonterminal symbols. Look back at the algorithm I put above. Notice that we pushed a value on the stack in two cases: 1. When we saw a terminal symbol, and there was an arc labelled with it, or 2. When we performed a reduction, and then advanced along an arc labelled with a nonterminal Let us modify our stack so that it now can keep both states and values. In the first case, before pushing on the new state let us push the attribute value associated with the token (the value returned by our lexical analyser). We can use some default value, say zero, if there was no attribute associated with the token. What about in the second case? Well, what do we have available to us? When we pop the stack, we can get the attributes associated with the symbols on the right hand side of the production. Let us associate with each production some actions that produce a new value, the attribute for the nonterminal, and then push that on the stack before pushing the new state. In YACC notatation, the semantic action is simply a C statement. The attributes being popped off, those associated with the right hand side, are denoted $1, $2 etc (this is somewhat confusing, since in the case of terminals the same thing was called yylval on the lex side). The name for the attribute being assigned is $$. Consider the following example: (go over in detail) assume we have two routines newnode and newsymbolnode, which build abstract syntax tree nodes for us. E -> E + T {$$ = newnode($2, $1, $3);} E -> T {$$ = $1;} T -> T * F {$$ = newnode($2, $1, $3);} T -> F {$$ = $1;} F -> id {$$ = newsymbolnode($1);} F -> ( E ) {$$ = $2;} There are several important things to remember. A semantic action is executed when we do a reduction; that is, when we have seen input which matches the right hand side of the production. At this point we have available to us the attributes associated with each of the symbols on the right hand side. you can do anything you want in the semantic action associated with a production. You can, but need not set an attribute value (as with tokens, some default value, such as zero, will be used if you don't set anything). Attributes are simply values. The parser doesn't care what values you use; they can be anything you want. (You do have to declare types, and YACC will do type checking for you). Basically, you can use attributes to do lots of different things; in the case of assignment 3, we are using them to count symbols in id lists, return type records, and others. In some places we have semantic actions but no attributes.