Lecture 21 - 480/580 (compiler construction) Peephole Optimization Today I want to discuss yet another optimization that can be applied to object programs - that is, once we have finally generated assembly language. This optimization is called peephole optimization. The idea is that we look only at a small portion of a program; as if we are viewing the program through a peephole and can only view three or four adjacent instructions. Well, actually its is a rather odd peephole, because while we will usually be looking at physically adjacent instructions, there will be times when we will also want to look at logically adjacent instructions. Let's look at some examples. Because of the way we are using the stack, the code produced by your programming assignment will sometimes have instructions such as the following: move r0,tos move tos,r0 this is becuase, in order to simplfy the code generator, we make sure that all results always end up on the stack. Sometimes, however, we would be better off if we just left the result in a register. Clearly a peephole optimizer, which doesn't care at all that these came from two different operators, can see that both of these instructions can be improved and the same rusult will be generated. We have to be careful, however, that these two instructions are always executed one right after the other. If, for example, we had the sequence: move r0,tos L23 move tos,r0 then of course we DON'T want to make the optimization unless we know what other instructions may be branching to L23. There are other situations where we can get redundant loads and stores. Consider, for example, a sequence such as: a := b c := b + 1 On a register machine this could be translated into something like the following: move 12(a6),r0 move r0,16(a6) move 12(16),r0 add 1, r0 move r0,20(a6) Here it is not the physically adjacent instructions that tip us off, it is knowledge about what a register contains at a specific point which is important. Immediately after the second instruction, our peephole optimizer should know that r0 contains BOTH the contents of 12(a6) AND the contents of 16(a6) - and should either one be required in the next instruction it should not perform the load. The way we did control flow can also frequently cause inefficient instructions. For example, we will often have situations such as the following: bne L1 br L2 L1: Here we could eliminate an instruction, and perhaps even a label, by reversing the sense of the jump: beq L2 Remember I said earlier that we wanted to consider logically, and not necessarily physically adjacent instructions. We can consider a branch as making an instruction adjacent to the instruction it is branching to. Thus, for example, if we get sequences such as: br L1 ... L1: br L2 We can clearly improve the runtime efficieny by changing the first instruction to br L2. If we can determine that no other instructin branches to L1 we can also eliminate the second instruction. Conditional branches can also be involved, consider both: beq L1 ... L1: br L2 and br L1 ... L1: beq L2 Here both of these can be optimized. What about the fourth possibility, a conditional branch to a conditional branch? we can handle this, but will it ever happen the way we are generating code? In our code we eventually put all subscript calculations into a register. Something like up[r-c+8] (which occurs in the eight queens code) is changed into movd -4(fp),tos subd 8(fp),tos movd tos,r0 cmpd up+28[r0:d],0 A peephole optimizer can see that the register can be used for the intermediate calculations, saving one instruction: movd -4(fp),r0 subd 8(fp),r0 cmpd up+28[r0:d],0 Many machines (unfortunately not the NSC 32000) have instructions to clear a memory location, or test a location, so instructions such as the compare against zero given above could be replaced by a ``test'' instruction. Similarly assinging zero can be replaced by a ``clear'' instruction. Other cases where single instructions can be replaced by shorter or faster instructions include using so-called ``quick'' instructions, where a small integer (on the NSC it must be 4 bits or less) can be encoded as part of the instruction. On the NSC there are addquick, subtractquick, and so on. Another common multi-instruction sequence occurs for something like i := i + 1. On our machine this is translated into the following: movd i,tos addd 1,tos movd tos,i This can be translated into the single instruction addd 1,i (Notice that in Pascal we need a good code generater to make this change. The C language takes a different approach, by giving the user the forms i++ and i += 1, it means that even a relatively simple code generator can produce fairly good code for C). One of the more spectacular uses of peephole optimization is in the recogniztion of machine idioms. Many machines contain odd, special case instructions that are difficult for a compiler to make effective use of. For example, the NSC has an instruction to do an add, compare, and branch in a single instruction. This is clearly intended to make efficient for-loops. However, recognizing the pattern in the intermediate code which corresponds to this is non-trivial. How is this done? One way is for an assembly language whizz to sit down and think long and hard and to come up with a bunch of ad-hoc rules that will work; and then to write a pattern matcher to just scan the program looking for instances of these rules. A better solution takes all pairs of logically adjacent instructions and translate them into a REGISTER TRANSFER language - a special mini-assembly language which describes how data is moved from one register to another by the instruction. Often the effects of these transfers will cancel each other out. It then matches all possible instructions against the register transfer, trying to find a minimal sequence that will achieve the desired functionality. The results are often surprising, even to the authors of the optimizer! Since this can all be automated, creating an optimizer can be performed simply by making a register transfer description of the machine.