Lecture 20 - 480/580 (compiler construction) Code Generation We have been ignoring the actual machine for almost all of the course, with good reason; machines are quirky, each one is different, and if we had presented ideas in the context of one particular machine you might have the impression that they were not applicable to machines of a different type. Unfortunately, we can not do that any longer. Let us consider first what sort of tools we have to use in running a computer program. That is, what sort of things one manipulates when one is writing assembly language. * memory, including the activation frame memory. may be segmented, paged, virtual or real. * registers - most basic, may have different numbers and different types * a stack - may be built in to the hardware, or implemented using registers. * instructions - in particular the different addressing modes used in instructions may be helpful in some constructs. How these are implemented may make our life easier or harder. Unfortunately, most machines seem to be implemented by people who have never had to write a compiler. Consider registers. Early machines had one register, called an accumulator. This made code generation relatively easy, but it soon became evident that more registers would be useful (to hold temporary results and the like). The next generation of machines had more, anywhere from 6 to 16 general registers. But nothing was free; having more registers meant that you had to save more registers on procedure entry and exit. Furthermore, some of these registers might not be as useful as others. For example, on some machines integer division requires a *pair* of registers--one part of the pair returning the quotient and the other the remainder. (IBM seems to like this). On other machines you have different classes of registers. The MC68000 (the processor in the HP), has addressing registers and data registers. But some calculations can be done in either, and some can't. These differences make life hard. Consider addressing modes--in the MC68000 some instructions can take all addressing modes, and other, seemingly similar instructions won't. This means that almost every instruction must be treated on a case by case basis, with no commonality between them. But I'm getting ahead of myself. More on these problems later. There are, generally speaking, two broad strategies that are used in code generation. I will describe each of these in turn, and then describe how it is REALLY done nowdays, which is usually a combination of these two techniques. The first strategy is what is called a stack machine. You have all seen how infix arithmetic instructions can be translated into Reverse Polish. If we have this Reverse Polish Notation, the most natural thing to do is to evaluate expressions using a stack. This corresponds to just walking over the abstract syntax tree, and on operands (constants, registers, global addresses), pushing values on the stack and on operators (addition, negation, so on) taking values from the stack and working on them, pushing the result back on the stack. Code generation of stack machines is trivially easy; because of this it is very attractive. Because of this attraction some early machines (Burroughs in particular) worked ONLY in a stack fashion- everything had to be written in terms of stack operations. (In fact, the Burroughs didn't really have an assembly language - all system code was written in an ALGOL dialect). Lets look at some code for an expression in stack form: (x + 12) * 7 push a6 push -8 add get value push 12 add push 7 mult The PDP-11 revolutionized the computer world by introducing a variety of new addressing modes. (They didn't invent addressing modes, just started using them in novel ways). One of the addressing modes they introduced was the auto-increment and auto-decrement; instructions which used a register and then automatically added a value to it. With these instructions it was possible to take a basically register machine and use it like a stack machine. For example, the code could become: move -8(a6),-(sp) move #12,-(sp) add (sp)+,(sp) move #7,-(sp) mult (sp)+,(sp) Most machines built since the PDP-11 have followed a similar course. What is the problem with generating code in this fashion? Well, for one thing it is inefficient. A better code fragment for the above might be: move -8(a6),r1 add #12,r1 mult #7,r1 That is, registers can hold results we are currently working on, temporary results we will need later, and so on. Even more important, registers can hold a subexpression that is used several times, permitting us to compute it just once. Consider, for example, something like: (a+b*c)/(d+b*c) This could be translated into something like the following: move a,r0 move b,r1 mul c,r1 add r1,r0 add d,r1 div r0,r1 Compare this to the stack translation, where there seems to be no easy way to save the common subexpression. But registers have their own problem. Consider generating code for + on a register machine; you might want to consider at the very least the following cases: 1) both arguments are of a form (variable, constant) that can addressed immediately using an addressing mode. We will want to put one into a register, and then operate on the other: move left, r1 add right, r1 2) the right argument is not in a form that can be addressed immediately using an addressing mode, but when we compute the left argument it will come back to us in a register. We want to use that register over again. (compute left) add right, r1 3) same as 2 but with left and right reversed: (compute right) add left, r1 4) both are complicated expressions that will be returned in registers. We will want to add the two registers together and then free up one of them: (compute left) (compute right) add r1, r2 (free up r1) There are even more cases to consider if we are generating code from DAGs and we want to leave the results from one of the children in a register for later use. See page 538 for some of the complexity. Notice we made use of the fact that addition was commutative here. What about subtraction? In case (3) this will give us the negation of the result we want. Do we want to use another register or generate two instructions? If we do generate code in this fashion, we must be prepared for the inevitable time when we run out of registers. Say when we are computing something like: (((a+b)*(c+d))*((e+f)*(g+h)))-(i/k) [ look at the tree ] Basically, when this happens, we need to store something we are currently holding in a register and later load it back into a (perhaps different) register. This is called a register spill. One thing that we can do to avoid register spills is to perform our tree transformations. If instead of a balanced tree we have a tree with expressions runnning down the left subtree then we will probably use fewer registers, since there is a greater chance of being able to use addressing modes and reusing registers. Another thing we can do is to have more registers. But more registers cause problems as well. Each procedure wants to think it has its own set of registers, which continue to exist across module boundaries. This means, if we use a register style of code generation, that registers must be saved and restored when a procedure is invoked. On some machines this required explicit instructions. Some more recent machines (such as the NSC) have simplified this by giving special instructions which save registers. For example, on the NSC you can give a list of registers on the enter and exit instructions, and they will be automatically saved. genstart enter [r1,r3],size genend exit [r1,r3] ret 0 Of course, nothing is free, and saving registers takes some time. So there is a trade off - we can make the procedure body execute faster (by using registers rather than the stack) but by doing so we may make the procedure call slower. Which is more important? Which contributes more to execution time? There are no hard and fast rules. A novel idea introduced in the last few years is a register window. Basically, the user sees only a small number of registers (say 32 or 64). The system, however, supports many many more (say 512 or so). Furthermore, the registers the user sees are divided into three parts: 1) those shared with the calling procdure 2) those unique to this procdure 3) those that will be shared with any procedure that is called. As procedure calls are made, this window moves up and down the register set. Thus saves and restores are minimized, and access to parameters can be made very fast. But we have been discussing registers used to evaluate expressions. But remember we used a DAG to discover common subexpressions. Equally important is to use registers to hold common values. If we do global common subexpression evaluation, perhaps the largest class of common values may be variables that are repeatedly used. Thus we might also want to use registers to hold these values. (you begin to see why optimal code generation is difficult, we have competing demands being placed upon registers, and no clear guidelines on how to satisfy these demands). In order to do a good job of assigning variables to registers, we need to know which variables are commonly used. Thus global data-flow analysis, discussed last time, is important. Now let us consider yet another bane of compiler writers, non-orthogonal instruction sets. In your assignment, we are performing additions in the following manner: put left argument on stack put right argument on stack move.l (sp)+,d0 add.l d0,(sp) One would think that one could just replace add.l with asl.l to get an arithmetic shift. But, no, that addressing mode is not permitted for arithmetic shifts. The result of a shift must be left in a register, not in memory. Thus we could use two registers: move.l (sp)+,d0 move.l (sp)+,d1 asl.l d0,d1 move.l d1,-(sp) In your assignment I make use of the fact that the left argument is always a constant, which makes the code slightly shorter. The fact that these two instructions, which are seeming similar, cannot be used in the same way makes code generation very difficult. We can't have a code generation algorithm that is based on the fact that we are looking at a binary operator, but instead must consider each of the binary operators individually and consider cases that are specific to just that operator. This is not the class to discuss in detail how register allocation is done. I have pointed out what some of the problems are, and you should have some idea why it is difficult. The book goes into horrible gory detail the solution of some of these problems. The bottom line is that: Stack-style code generation is easy, can handle arbitrarily complex expressions, but generates less efficient code. Register-style code generation is harder but generates more efficient code. With any finite number of registers there is a limit on the complexity of expressions that can be evaluated without running out of registers. How is it really done? How do real compilers work? Let us take for our example the C compiler under Unix. The situation is far from simple, but here are the main points: * instead of global flow analysis, the compiler allows the USER to tell the compiler which variables are commonly used, and should be kept in registers. (some people think that this is an example of the compiler writer avoiding his responsibility, but it does make the compiler easier). If there are enough registers a free register is assigned to each variable designated by the user. * Using the remaining registers, each statement is examined in turn. If the statement can be evaluated using the free registers, code is generated to do so. If not (that is, if register spills would occur), then code is genenerated in a stack fashion. Stack code is generated very seldom, since most statements are very simple. Nevertheless, complex statements can also be handled. This uses the best features of both styles. * No attempt is made to save repeated values (which, in fact, occur far less in C since arrays are so uncommon and so primitive). The result is fairly good, and fairly concise.