Lecture 15 - 480/580 (compiler construction) Code Generation for Boolean Expressions Recall from last time that I said our basic building blocks were * sequential execution * unconditional branches * conditional branches using relations lt, le, eq, ne, ge, gt From these we will build everything else. Last time I discussed how to generate control flow assuming we could take an arbitrary boolean expression and generate code for if expression IS FALSE goto label Today, we will recify that omission and go back and discuss how to produce code for this statement. First, let us discuss what can go into a boolean expression. Boolean expressions are composed of the following pieces: negation (unary not operator) logical operators (and and or) relational operators (some languages may have others, we will discuss them as we go along). Let me discuss them in this order. The way I will discuss them is take an expression where the operator concerned is at the top, and show what effect the operator will have on the code. When I have gone through all the operators, we will know how to generate code for each expression. If you have looked at code.c, you will have noticed that I generate code for boolean expressions using an odd routine call gencond(e, t, f); this routine takes an expression tree, and two integer (label) arguments. One is the label to branch to if the expression is true, the other is the label to branch to if the expression is false. ONE OF THESE MUST ALWAYS BE ZERO; meaning we should fall through to the next statement. That is, we are going to generate a branch for only one condition, falling through if the condition is not satisfied. genfalseexpression(e, label) therefore merely generates a gencond(e, 0, label) Given this framework, let us look at the various boolean expressions, starting with not. Well, if we want to generate code which will branch in one case and fall through to the next statement in a different case if an expression is not true, how might we do it? How about reversing the cases? If we reverse the arguments, we invert the condition. So all we do is recursively call gencond gencond(e->leftchild, f, t) So a not operator DOES NOT GENERATE ANY CODE! Let us go on to logical operators. How are logical operators defined? Somebody define "and" for me. Considering these, we see one very important way that programming languages differ from mathematics. While logic is two valued in mathematics (remember the law of the excluded miracle? everything must be either true or false). In programming languages, "logic" can be three valued; something is either true, false, or ill-defined. A classic example occurrs with great regularity. Suppose A is an array [1..10]. Consider the expression (i <= 10) and (A[i] <> 47) Suppose i has the value 11. What is the truth or falsity of this expressions? Support your answer. Part of the problem is that us Westerners have a strong left to right bias. We are also lazy (some might say "efficient"), and take shortcuts when we can. Since we can determine the truth or falsity of the expression without answering the tricky question of what the right hand part means, we simply ignore it. This is called "short-circuit evaluation". It is important to note that not all languages use short-circuit evaluation. Pascal explicitly forbids it, for example. Other languages leave it up to the programmer, by providing two operators (&& and & in C, and and cand in Dijkstra's paper language). Semantically, another approach (rather than introducing new operators) would be to redefine the logical operator and to make it three-valued. I will show how to generate code for and and or using short-circuit evaluation. Afterwards, if there is time, we will come back and discuss non-short circuit evaluation. Let us take and. We have two cases; in one we have a label to branch to if the condition is true. How is and true? if both arguments are true. So we must evaluate both before we can branch, but if the first one is false we can quit immediately. So we genrate a new label and produce the following code: L1 = makelabel(1); IF leftchild is false goto L1 gencond(leftchild, 0, L1); IF rightchild is true goto t gencond(rightchild, t, 0); L1: genlabel(L1); The other case, we have a label to branch to if the condition is false. Here we can do it directly without producing a new label. IF leftchild is false goto f gencond(leftchild, 0, f); if rightchild is false goto f gencond(rightchild, 0, f); Similarly, we can do the same thing for or's. Case if label is true if lefthcild is true goto t gencond(leftchild, t, 0); if rightchild is true goto t gencond(rightchild, t, 0); case label is false, we have to introduce a new label L1 L1 = makelabel(1); if leftchild is true goto L1 gencond(leftchild, L1, 0); if rightchild is false goto f gencond(rightchild, 0, f); L1: genlabel(L1); So, we are left with relationals. These I said were our basic building blocks. If I have a label to branch to if our condition is true, it seems straightforward. Exactly what the CODE looks like will depend upon several factors; what type of machine we are working on, what type of registers we have, how they are being used, what types of terms we have, and so on. We will discuss these another day. Suppose, however, that I have a stack-style architecture; we might generate something like: evaluate leftchild and push on stack evaluate right child and push on stack compare top two elements on stack branch if LT (or whatever) to label t Remember how gencond was defined. We could equally well have a label to branch to if the condition is FALSE. How do we do that? What does it mean to say that A < B is FALSE? Isn't this the same as saying that A >= B is TRUE? (assuming A and B are comparable, that is). We can simply invert the condition and generate the same sort of code we did for the true branch. -------------can stop here, or go on if there is time ----------------- We have used boolean expressions only for control flow. But our grammar also allows them to be used in expressions. That is, our grammar permits us to say things like x := a < b; How might we generate code for this? First off, we have to agree on some representation for true and false values. A common one (Pascal and C both use it) is to use zero for false and one for true. We can reduce the problem to one we have already solved: IF expression is FALSE goto L1 result is 1 GOTO L2 L1: result is 0 L2: use result Or, we can even be a little fancy: result is 0 IF expression is FALSE goto L1 result is 1 L1: use result On the other hand, when we are called upon to assertain the value of a boolean strored in a variable, how can we do it? Again, it may depend upon the machine; Some machines will set a "condition code" simply by accessing a value, so we could generate something like load value into register branch if nonzero to L1 On other machines, we have to do an explicit test against zero. (btw, testing against zero is probably better than testing against one; would you rather have 2 mean true or false?) You may have been surprised that and and or didn't use the AND and OR instructions that you find on many machines. The reason is that these are BITWISE operations, and not logical operations. They DO have a place, however, if we are required to NOT do short-circuit evaluation. In this case, we must evaluate both arguments; then make sure they are correct. Let us do it as if we had a stack-based system. Suppose we wanted to evaluate: (a < b) and (c < d) We would generate something like the following: push 0 on to stack IF (a >= b) goto L1 add 1 to top of stack L1: push 0 on to stack IF (c >= d) GOTO L2 add 1 to top of stack L2: Do a bit-AND of top two elements on stack As you can see, the code generated by short circuit evaluation is both faster and shorter.