CS 480: Translators

Winter Term: 2011, Professor Budd

Grammar for Programming Assignments

The language we will use as the source for our compiler is designed to be a simple yet realistic language. The language has been specially constructed so that tokens can be recognized by a lexical analyzer with one symbol lookahead, and parsed using a LL(1) recognizer. Nevertheless, the language includes a number of interesting features, such as classes, functions, type declarations, and pointer values.

In the following, literal tokens are shown in bold, nonliteral tokens in italic. Curly braces are used to indicate repetition zero or more times. The symbol ε is used to indicate nothing, the absence of tokens.

Program Structure

program ::= { declaration ; }
declaration ::= classDeclaration

| nonClassDeclaration
nonClassDeclaration ::= functionDeclaration

| nonFunctionDeclaration
nonFunctionDeclaration ::= variableDeclaration

| constantDeclaration

| typeDeclaration
constantDeclaration ::= const identifier = constant
typeDeclaration ::= type nameDeclaration
variableDeclaration ::= var nameDeclaration
nameDeclaration ::= identifier : type

A program is a sequence of declarations. There are five types of declarations, constants, types, variables, classes, and functions. Constants must represent values known at compile time.

Classes and Functions

classDeclaration ::= class identifier classBody
classBody ::= begin { nonClassDeclaration ; } end
functionDeclaration ::= function identifier arguments returnType functionBody
arguments ::= ( argumentList )
argumentList ::= nameDeclaration { , nameDeclaration }

| ε
returnType ::= : type

| ε
type ::= identifier

| ^ type

| [ integer : integer ] type
functionBody ::= { nonClassDeclaration ; } compoundStatement

A class creates a new type name and a name scope. Classes can contain both data fields and member functions. Functions can define optional return values. A function body consists of zero or more declarations followed by a compound statement. The identifier in the type nonterminal must represent a class name, or a defined type name. The latter can be a predefined type ( int, or real) or a name that has appeared earlier in a type declaration. The ^ operator in a type declaration indicates a pointer type.


compoundStatement ::= begin { statement ; } end
statement ::= returnStatement

| ifStatement

| whileStatement

| compoundStatement

| assignOrFunction
returnStatement ::= return

| return ( expression )
ifStatement ::= if  expression   then statement

| if  expression then statement else statement
whileStatement ::= while  expression do statement
assignOrFunction ::= reference = expression

| reference ( parameterList )
parameterList ::= expression { , expression }

| ε

There are six different types of statements. The else clause in an if else statement always joins with the closest surrounding if statement. The value in the return statement must match the type of the surrounding function declaration. The expressions in the test portions of the if or while statements must be boolean. The expression on the right side of an assignment must be either integer, floating point, or pointer type.


expression ::= relExpression { logicalOperator relExpression }
relExpression ::= plusExpression

| plusExpression relationalOperator plusExpression
plusExpression ::= timesExpression { additionOperator timesExpression }
timesExpression ::= term { multiplicationOperator term }
term ::= ( expression )

| not term

| new type

| - term

| reference

| & reference

| reference ( parameterList )

| constant
reference ::= identifier

| reference ^

| reference . identifier

| reference [ expression ]

The new operator returns pointer to a heap-based value.  (We have no delete operator, so memory is not recovered). The constant types are integer, real and string. The ampresand operator returns the address of the corresponding reference. The ^ symbol used as an operator on a reference dereferences a pointer value. The dot notation is used to access both data fields and member functions in a class structure.

Lexical Conventions

  1. Comments are surrounded by { and }. They may not contain either a { or a }. Comments may appear anyplace but within a token.
  2. Blanks between tokens are optional, with the exception that keywords cannot be immediately adjacent to identifiers.
  3. The identifier token matches a letter followed by letters or digits.
  4. There are three constant types. An integer constant is a sequence of one or more digits, interpreted as a base-10 value. A floating point constant is a sequence of one or more digits, followed by a period, followed by zero or more digits. A string constant is surrounded by a pair of double quotes, and cannot contain a double quote character.
  5. Keywords are reserved, and cannot be used as variable names.
  6. The logical operators are and and or.
  7. The legal relational tests are < <= !=, ==, >= and >
  8. The legal addition operators are + and -, and <<. The latter is the left shift operator, and requires two integer arguments.
  9. The legal multiplication operators are * and / and %. The latter is the remainder operator, and requires two integer arguments.
  10. Arguments are passed by value. Legal arguments must be integer, floating point, or pointer type.
  11. Arithmetic operations on integer arguments return integer values. If both arguments are floating point the result if floating point. If one argument is integer and other other floating point, the integer argument is converted to floating point and the result is floating point.
  12. Relational and logical operators return a value of type boolean.
  13. The lower bound in a declaration of any array type must be less than the upper bound.
  14. In a field expression, the identifier must match one of the declared field names.
  15. All variables and functions must be declared prior to their first use.