CS 480/580 self study questions

CS 480/580 -- Translators

Winter Term, 2005

Self Study Questions

Use the following questions to guage if you understand the material in lectures. If you can answer all the questions after reading the book and listening to the lecture, then you should be in good shape for the exams.


Lecture 01

  1. Why is programming a computer difficult?

  2. Why is the study of programming languages the most important part of computer science. (Just don't tell the professors in other areas I said that!)

  3. What are some various different types of translator?

  4. What are the basic steps a translator must perform?


Lecture 02

  1. Why must we perform lexical analysis? What is a lexeme (or token)?

  2. What are the fundamental constructs for writing regular expressions?

  3. What are some shorthand notations that make it easier to describe tokens as regular expressions?

  4. Why does a lexical analyzer need a push back buffer?

  5. How can one make a FA to recognize tokens?

  6. When asked for a new token, how does the lexer use the FA?

  7. Why might we need to back up states in the FA after we stop being able to move forward?


Lecture 03

  1. Why is it difficult to build lexers in the FA technique if you are writing them ``by hand'' ?

  2. What is the overall structure of a lexer written ``by hand'' ?


Lecture 04

  1. What are the levels of the Chomsky language hierarchy? (at least, as we use them in computer science).

  2. What computing models are matched with what levels of languages?

  3. What do the letters in BNF stand for?

  4. What is a parse tree? What is derivation (both as noun and verb)?

  5. How is an abstract syntax tree different from a parse tree?

  6. What does it mean if you say that a grammar is ambiguous?

  7. Are grammars the same as languages?

  8. What is the general technique for removing left recursion from a grammar?


Lecture 05

  1. Using the parsing technique of recursive descent, each nonterminal is recognized by what?

  2. When we are faced with a choice of alternative productions for the same nonterminal, our first idea is to try one, then if that doesn't work, try the other one. What major problem does this introduce for the lexer?

  3. The elimination of backtracking in the above scheme requires the introduction of a restriction on the set of grammars we can handle. What is this restriction?

  4. A grammar that satisfies this restriction is said to be what?


Lecture 06

  1. What is the definition of the first set relation?

  2. What is the definition of the follow set relation?

  3. What is the epsilon set? (From notes, not in book).

  4. What is the algorithm for discovering the epsilon set?

  5. What are the cases used in building up the equations used in generating the first set?

  6. Once we have the equations, how are the first sets discovered?

  7. What are the cases used in building up the equations used in generating the follow set?

  8. Once we have the equations, how are the follow sets discovered?

  9. Once we have the first and follow sets, how can we rephrase the LL(1) restriction?


More on Top Down Parsing

  1. What are the advantages of writing a parser using the technique of recursive descent?

  2. What are the two major disadvantages of writing a parser using the technique of recursive descent?

  3. Which of these two problems is being addressed by the technique of top-down nonrecursive predictive parser?

  4. In the table used in top-down nonrecursive predictive parser, what are the index values for the rows? What are the index values for the columns?

  5. What type of values are kept in the cells of the table? How are first and follow sets used in determining how to fill in the entries in the table?

  6. The parser is mostly a while loop. Describe in general terms what this loop does on each iteration.

  7. What are some ways that a parsing error will be recognized by this algorithm?

  8. The parsing operation builds a parse tree. In what order are elements of this tree recognized?

  9. What are the advantages of using top-down table driven predictive parsing over recursive descent?


Lecture 07

  1. Where does the NDFA used in the LR parsing algorithm come from? (Later we will see a technique that can be used to construct the DFA directly, but for this question I'm asking for motivation, not mechanics)

  2. In the NDFA there are many arcs labelled with a nonterminal. How are these ever traversed, since the input consists only of terminals?

  3. When you ``back up'' after seeing something that matches a production rule, how many states do you need to back up?

  4. What action do you take after backing up?

  5. What does the R in LR parsing stand for?

  6. What is the push down stack in the LR parsing algorithm used for? (That is, what type of values does it hold?)


Lecture 08

  1. What is a shift action?

  2. What is a reduce action?

  3. What is a shift/reduce conflict?

  4. What is a reduce/reduce conflict?

  5. What does it mean to say that if no errors have been detected by an LR parser, that the input represents a ``viable prefix''?

  6. What is a handle? What do LR parsers only recognize inputs that are handles?

  7. What is an attribute? How can we build attributes in a natural fashion as input the parser goes through its motions?


Lecture 09

  1. What is an ``item''?

  2. The algorithm to build the DFA directly consists of two parts, closure and goto. What is the closure algorithm?

  3. What is the goto algorithm?

  4. The SLR rule says to reduce when the next symbol is in the follow set for the production being reduced. Why is this too general a rule?

  5. The LALR technique (and the cannonical LR(1) technique, which is where it is described in the book) use the idea of a production augmented with a lookahead. What is a lookahead?

  6. What is the lookahead for the starting production?

  7. How is the closure algorithm changed to compute lookaheads?

  8. What is the LALR rule for resolving shift/reduce errors?


Lecture 10

  1. Why do we need symbol tables? Why can't grammar rules be used to match symbol declarations and symbol uses?

  2. What is a symbol name?

  3. What is a symbol scope? What are some various different types of scope found in programming languages?

  4. What are various different types of symbols? For each type of symbol, what information does the symbol table need to maintain?

  5. For functions there are two symbol tables, one holding local variables and one holding arguments. Explain why the local table can be thrown away after we are finished processing the body of the function, but the argument table must be kept around.

  6. Why must a class symbol table be kept around even after we are finished reading the entire class description?


Lecture 11

Consider first a language model (similar to FORTRAN in the 1950's) were we have (a) no recursion, (b) no nested procedures, (c) call by value parameters, (d) fixed sized variables and no dynamic allocation (e) information about locals only accessible when executing a procedure. We will then remove these restrictions one by one and consider the implications.

  1. The run-time space associated with every procedure is known as an activation record (or activation frame). What three types of information is found in this area?

  2. What are the steps involved in calling a procedure?

  3. How does one access local variables?

  4. Suppose we allow access to both locals and global variables (two levels of scope, as in C). Where are global variables found?

  5. Suppose we allow call by reference parameters in place of call by value. How does this change the memory layout? How does this change the calling procedure?

  6. Suppose now we allow recursion. What problem does this introduce? How can it be solved? What important property of procedure calling are we depending upon?

  7. How are local variables now accessed?

  8. How does the creation of an activation record change the calling protocol? Which partner is in charge of creating which portion of the activation record?


Lecture 11a, more on up-level addressing

  1. Suppose we allow nested functions. What is a static scope level? A procedure at level i can call procedures at what scope levels?

  2. How are non-local, non-global variables accessed?

  3. How does the use of nested functions change the procedure calling protocol? What invariant must be preserved by the caller.

  4. When a procedure calls another procedure one level down, what is the static chain? When it calls a procedure at the same level what is the static chain? When it calls a procedure some number of levels UP, what is the static chain?

  5. How can we use the fact that the symbol tables mirror the nested AR to simplify the access to non-local variables? How can we use this to simplify the way the static chain is set?

  6. Why does C not have nested functions? That is, how does the absence of this feature simplify the run-time action required by the language?


Lecture 11b, more on functions as values

  1. What problem arises in setting up the static link chain when we pass a function as an argument?

  2. How is this problem solved? What is a closure?

  3. Why is it called the downward funarg problem? What is a funarg?

  4. What problem is created when a function is returned as a result from another function? (or assigned to a variable, which results in much the same thing).

  5. How is THIS problem solved?

  6. What makes the solution of the second problem (the upward funarg problem) so much more costly than the solution to the downward funarg problem?

  7. What is a pass-by-name parameter?

  8. How are pass-by-name parameters implemented?


Lecture 11c, Run-time environment for OOP

  1. What does it mean to say that one class inherits from another?

  2. Why is the receiver passed as a hidden first argument? What is this?

  3. What feature of the memory layout for subclasses allows member functions to operate on any type of subclass?

  4. What is a polymorphic variable?

  5. Why do polymorphic variables complicate the use of stack-based (automatic) storage?

  6. What are the two most common solutions to this problem? Which one does Java use?

  7. What is a virtual method table? How is dispatch performed using vtabs?

  8. Compare a vtable from a child class to that of the parent class. When a method is inherited, what is stored in the vtable? When a method is overridden, what is stored in the vtable?

  9. Why does multiple inheritance complicate the memory layout of polymorphic variables?


Lecture 12

  1. What is an intermediate representation? Why would a compiler use an intermediate representation?

  2. Why would the level at which you generate code influence the form of the intermediate representation you use?

  3. What information must be maintained by the intermediate representation in order to perform error checking?

  4. How does the intermdiate representation description of a variable differ from the high level language description?

  5. How is the representation of a variable used as an lvalue (the target of an assignment) differ from the intermediate representation of a variable used as an rvalue (an expression)?

  6. What does the intermediate representation of the calculation of an array element represent?

  7. What does the intermediate representation of a field (record or class) expression look like?

  8. What does the intermediate representation of a function call look like? How is this different from other expressions?

  9. What does the intermdiate representation of a unary or binary operator look like?


Lecture 13

  1. What purposes do types serve in programming languages?

  2. What are some examples of type errors that can be caught at compile time? What are some errors that can only be caught at run time?

  3. What are some types that are mapped onto integer values in a typical compiler? Why not simply use integers for all these purposes?

  4. Explain the tradeoff between very strong typing and expressibility. Arrays used as parameters are a good source of illustrations.

  5. What is the difference between structural and name equivalence of types?

  6. What is an overloaded function name? How do compilers typically resolve the meaning of an overloaded name?

  7. What is a polymorphic function?

  8. What is type coercion? What is implicit coercion?


Lecture 14

  1. What are the basic elements that all control flow statements are mapped on to?

  2. What is the translation for the conditional (if) statement, both with and without the else part?

  3. What is the translation for a while statement?

  4. What is the translation for the repeat statement?

  5. What about the for statement? Why might a compiler create an internal temporary in a for statement?

  6. What are some of the possible ways of translating a case statement? Under what conditions would you want to use each of the possibilities?


Lecture 15

  1. Explain how it is that the unary not operator does not end up generating any additional instructions.

  2. How does the interpretation of the logical operators and and or in computer languages differ from the logical interpretation?

  3. On machines were we can only branch if a relational is TRUE, how do we convert a BRANCH IF FALSE for a relational into a BRANCH IF TRUE?

  4. (If we get to this part). If we assume integer values 1 and 0 are used internally for true and false, how does a boolean expression get converted into a boolean value? How does a boolen value get converted back into control flow?


Lecture 16

  1. What properties must an algebraic transformation possess in order to be considered as an optimization technique?

  2. What are some of the transformations used in your programming assignment?

  3. Why is it generally more efficient to replace a multiplication by a shift instruction?

  4. Why can't the programmer simply perform these transformations on the original source level?


Lecture 17

  1. Why is ``optimize'' a misnomer for the kind of transformations we are discussing today and next week?

  2. What are some of the resources one would like to optimize?

  3. What are some reasons why a compiler writer might NOT want to do optimizations?

  4. What is a DAG? How does it differ from an AST?

  5. What is a basic block?

  6. How can there be common subexpressions in code written by a ``good'' programmer working in a high level language? (by ``good'' I mean one that doesn't make obviously inefficient code).

  7. What is one reason why the task of optimization is difficult?

  8. What does it mean that an expression in a DAG is dead (or killed)?

  9. How does the language the source came from alter the set of items that might be killed by a given expression?


Lecture 18

  1. What is a control flow graph? What is the representation of each node in the graph?

  2. What are the characteristics of a loop in a control flow graph? That is, how can we detect a loop in a CFG?

  3. Once we have identified a loop, what is a loop invariant expression?

  4. How do we determine which values are being modified in a loop?

  5. What is code motion?

  6. How do optimization and debugging of a program at the assembly language level work at cross purposes to each other?


Lecture 18, part 2 (reduction in strength)

  1. What are the characteristics of an induction variable?

  2. Having found an induction variable, what type of expressions can we optimize?

  3. What is the invariant we need to preserve in order to do reduction in strength?

  4. For a simple example, illustrate how a multiplcation of an induction variable by a constant can be eliminated, to be replaced by an addition.


Lecture 19

  1. What is the goal of data flow analysis?

  2. For each node in the control flow graph, what does the kill set represent? What does the gen set represent?

  3. For each node in the control flow graph, what does the in set represent? What does the out set represent?

  4. How is the out set computed from the in, gen and kill sets?

  5. Once we have data flow analysis information, what sort of errors can we detect?

  6. Once we have data flow analysis information, what sort of optimizations can we perform?

  7. What is an available expression? How does it relate to common subexpressions?

  8. What is a busy expression? How does it relate to common subexpressions?


Lecture 20

  1. What are some of the resources that must be managed by the code generator?

  2. What are the two broad approaches to code generation?

  3. What are some of the advantages of a stack style code generation? What are some of the disadvantages?

  4. What are some of the advantages of having lots of registers? What are some of the advantages of having only a few registers?

  5. If we have a complex expression that is represented by a roughly balanced binary tree, what can happen if we try to evaluate the expression using a register style?

  6. What are some of the ways that we can avoid this problem?


Lecture 20

  1. Why is it that if we generate code statement by statement that it will be difficult for a code generator to recognize the optimizations performed by peephole optimization?

  2. Why do we want to consider logically adjacent instructions, as well as physically adjacent ones?


Study questions prepared by Professor
Tim Budd, oregon state university, corvallis, oregon, for use with the course CS 480/580, translators.