Lecture 15 - 480/580 (compiler construction)
Code Generation for Boolean Expressions
Recall from last time that I said our basic building blocks were
* sequential execution
* unconditional branches
* conditional branches using relations lt, le, eq, ne, ge, gt
From these we will build everything else. Last time I discussed how to
generate control flow assuming we could take an arbitrary boolean
expression and generate code for
if expression IS FALSE goto label
Today, we will recify that omission and go back and discuss how to produce
code for this statement.
First, let us discuss what can go into a boolean expression. Boolean
expressions are composed of the following pieces:
negation (unary not operator)
logical operators (and and or)
relational operators
(some languages may have others, we will discuss them as we go along).
Let me discuss them in this order. The way I will discuss them is take an
expression where the operator concerned is at the top, and show what effect
the operator will have on the code. When I have gone through all the
operators, we will know how to generate code for each expression.
If you have looked at code.c, you will have noticed that I generate code
for boolean expressions using an odd routine call gencond(e, t, f);
this routine takes an expression tree, and two integer (label) arguments.
One is the label to branch to if the expression is true, the other is the
label to branch to if the expression is false. ONE OF THESE MUST ALWAYS BE
ZERO; meaning we should fall through to the next statement. That is, we
are going to generate a branch for only one condition, falling through
if the condition is not satisfied.
genfalseexpression(e, label) therefore merely generates a
gencond(e, 0, label)
Given this framework, let us look at the various boolean expressions,
starting with not.
Well, if we want to generate code which will branch in one case and fall
through to the next statement in a different case if an expression is not
true, how might we do it? How about reversing the cases? If we reverse
the arguments, we invert the condition.
So all we do is recursively call gencond
gencond(e->leftchild, f, t)
So a not operator DOES NOT GENERATE ANY CODE!
Let us go on to logical operators.
How are logical operators defined? Somebody define "and" for me.
Considering these, we see one very important way that programming languages
differ from mathematics. While logic is two valued in mathematics
(remember the law of the excluded miracle? everything must be either true
or false). In programming languages, "logic" can be three valued;
something is either true, false, or ill-defined. A classic example occurrs
with great regularity. Suppose A is an array [1..10]. Consider the
expression
(i <= 10) and (A[i] <> 47)
Suppose i has the value 11. What is the truth or falsity of this
expressions? Support your answer.
Part of the problem is that us Westerners have a strong left to right bias.
We are also lazy (some might say "efficient"), and take shortcuts when we can.
Since we can determine the truth or falsity of the expression without
answering the tricky question of what the right hand part means, we simply
ignore it. This is called "short-circuit evaluation".
It is important to note that not all languages use short-circuit
evaluation. Pascal explicitly forbids it, for example. Other languages
leave it up to the programmer, by providing two operators (&& and & in C,
and and cand in Dijkstra's paper language). Semantically, another
approach (rather than introducing new operators) would be to redefine the
logical operator and to make it three-valued.
I will show how to generate code for and and or using short-circuit
evaluation. Afterwards, if there is time, we will come back and discuss
non-short circuit evaluation.
Let us take and. We have two cases; in one we have a label to branch to if
the condition is true. How is and true? if both arguments are true. So
we must evaluate both before we can branch, but if the first one is false
we can quit immediately. So we genrate a new label and produce the
following code:
L1 = makelabel(1);
IF leftchild is false goto L1 gencond(leftchild, 0, L1);
IF rightchild is true goto t gencond(rightchild, t, 0);
L1: genlabel(L1);
The other case, we have a label to branch to if the condition is false.
Here we can do it directly without producing a new label.
IF leftchild is false goto f gencond(leftchild, 0, f);
if rightchild is false goto f gencond(rightchild, 0, f);
Similarly, we can do the same thing for or's.
Case if label is true
if lefthcild is true goto t gencond(leftchild, t, 0);
if rightchild is true goto t gencond(rightchild, t, 0);
case label is false, we have to introduce a new label L1
L1 = makelabel(1);
if leftchild is true goto L1 gencond(leftchild, L1, 0);
if rightchild is false goto f gencond(rightchild, 0, f);
L1: genlabel(L1);
So, we are left with relationals. These I said were our basic building
blocks. If I have a label to branch to if our condition is true, it seems
straightforward. Exactly what the CODE looks like will depend upon several
factors; what type of machine we are working on, what type of registers we
have, how they are being used, what types of terms we have, and so on.
We will discuss these another day. Suppose, however, that I have a
stack-style architecture; we might generate something like:
evaluate leftchild and push on stack
evaluate right child and push on stack
compare top two elements on stack
branch if LT (or whatever) to label t
Remember how gencond was defined. We could equally well have a label to
branch to if the condition is FALSE. How do we do that?
What does it mean to say that A < B is FALSE? Isn't this the same as
saying that A >= B is TRUE? (assuming A and B are comparable, that is).
We can simply invert the condition and generate the same sort of code we
did for the true branch.
-------------can stop here, or go on if there is time -----------------
We have used boolean expressions only for control flow. But our grammar
also allows them to be used in expressions. That is, our grammar permits
us to say things like
x := a < b;
How might we generate code for this?
First off, we have to agree on some representation for true and false
values. A common one (Pascal and C both use it) is to use zero for false
and one for true. We can reduce the problem to one we have already solved:
IF expression is FALSE goto L1
result is 1
GOTO L2
L1: result is 0
L2: use result
Or, we can even be a little fancy:
result is 0
IF expression is FALSE goto L1
result is 1
L1: use result
On the other hand, when we are called upon to assertain the value of a
boolean strored in a variable, how can we do it? Again, it may depend upon
the machine; Some machines will set a "condition code" simply by
accessing a value, so we could generate something like
load value into register
branch if nonzero to L1
On other machines, we have to do an explicit test against zero.
(btw, testing against zero is probably better than testing against one;
would you rather have 2 mean true or false?)
You may have been surprised that and and or didn't use the AND and OR
instructions that you find on many machines. The reason is that these are
BITWISE operations, and not logical operations. They DO have a place,
however, if we are required to NOT do short-circuit evaluation. In this
case, we must evaluate both arguments; then make sure they are correct.
Let us do it as if we had a stack-based system. Suppose we wanted to
evaluate:
(a < b) and (c < d)
We would generate something like the following:
push 0 on to stack
IF (a >= b) goto L1
add 1 to top of stack
L1: push 0 on to stack
IF (c >= d) GOTO L2
add 1 to top of stack
L2: Do a bit-AND of top two elements on stack
As you can see, the code generated by short circuit evaluation is both
faster and shorter.