Lecture 17 - 480/580 (compiler construction) DAGs and Optimization The term optimization is really a misnomer. It is seldom that a compiler could generate provably optimal code; even if we could decided what that term means. A better term might be "code improvement", but "code optimization" is now commonly used, and so there is little hope of changing it. Really what we are concerned about in optimization is making effective use of resources. What are the resources? * time - computer cpu cycles are a resource (not nearly as costly now as they once were, but still a resource) and we want the program we produce to use as few as possible. * space - the same thing can be said for computer memory. * registers - * segments, pages or other assembly language issues But time is a sword that cuts both ways. There is the time the program we are generating will take, and there is the time the compiler takes to produce the program. We must always weigh one against the other, and which way the scale tips may depend upon a number of factors. For example, a compiler that is largely used for student programs (which are executed, on average, far fewer times than they are compiled), we may not want to spend a lot of time doing optimization. A compiler producing code for applications, which may be executed millions of times for one compile run, can afford to spend a lot of time doing optimizations. So it all depends. Enough general talk. Lets look at some example programs, and see how an intelligent compiler might produce better code. Let us look at the AST generated for the classic matrix multiplication program: for i := 1 to n for j := 1 to m c [i,j] := 0 for k := 1 to p c[i,j] := c[i,j] + a[i,k] * b[k,j] Let us look at the AST generated for the innermost statement. Let us make assignment into a binary operator, so we can make it as part of our tree. [ give tree ] Now, we can make a few optimizations; moving * past the - signs. Here is one we didn't do yesterday--runs of multiplications. Remember how we moved constants to the TOP of runs of additions? For multiplication we want to move constants to the BOTTOM so we can combine them with the constants that have been moved to the TOP Of additions. (give example). Well, can we do better? Notice anything? How about this repeated computation of (i * 4) + (-4), and so on. We are doing the same calculation twice. Suppose rather than recomputing it, we just pointed to the thing we had already computed. We don't have an AST any more, since it isn't a tree. This structure has a name, it is called a DAG (directed acyclic graph). The DAG helps us recognize common subexpressions; which we may want to compute only once. We can use the DAG to save us from recomputing common subexpressions. Any time we compute a result that will be needed later (we can tell this by the fact that there are two pointers leading to one node), we will save it (in a register or a memory location or something). Now, I'm glad that nobody pointed out the fact that we didn't totally finish with our algebraic simplifications before we made our DAG. Look at the original tree; we could have done: stuff + ((j * 4) + (-4)) => ( stuff + (j * 4) ) + (-4). But if we HAD done that, we wouldn't have recogized the ((j*4) + (-4)) as a common subexpression. This is an example of a common problem in optimization. We have two goals: * algebraic simplification, done by pattern matching * common subexpression elimination, done by DAGS and doing a too good job of one may interfere with the other. People are good at detecting this sort of thing, because we are good at recognizing patterns, forming plans and goals, and the like. Computers aren't. It is just this sort of interaction that makes optimization very hard; in both a theoretical (many problems in optimization are NP complete) and practical (many optimizing compilers are among the most complex programs ever built) sense. (The second major problem in optimization is that computers are, for the most part, build by hardware designers who only care about saving gates, not generating code. Thus having to put up with a brain-damaged instruction set is the other problem that gives compiler writers fits. This problem has been getting better in the last few years, as the cost of gates goes down and more hardware designers listen to software writers). Since we have included assignment as an operator in our DAG, We can even encode an entire block of assignment statements in a single DAG. There is a subtle trap, however. Consider the following: C := A[i] + 17; A[j] := 23; B := A[i] + 17; We are tempted to make the following tree (make tree). A couple things to notice--first, we have several roots (tops) of our tree. When are are generating code, where do we start? We can get around this by ordering the roots. Certainly ordering the roots by the order they occur should cause no problem; although more sophisticated algorithms can be given. But there is a more serious problem. Is this even right? Under what conditions is it right? How about replacing A[j] with just simply i? What we have to do is, after an assignment, disregard any nodes which represent expressions which might have been effected by the assignment. This is given a graphic term; we say we *kill* these nodes. Thus in this example, the second assignment kills any expression involving A (or i). What about, in C: C = A[i] + 17; *P = 23; B = A[i] + 17; Is this different in Pascal? So what nodes you have to kill might depend upon the language you are using.