The language we will use as the source for our compiler is designed to be a simple yet realistic language. The language has been specially constructed so that tokens can be recognized by a lexical analyzer with one symbol lookahead, and parsed using a LL(1) recognizer. Nevertheless, the language includes a number of interesting features, such as classes, functions, type declarations, and pointer values.
In the following, literal tokens are shown in bold, nonliteral tokens in italic. Curly braces are used to indicate repetition zero or more times. The symbol ε is used to indicate nothing, the absence of tokens.
| program | ::= | { declaration ; } |
| declaration | ::= | classDeclaration |
| | | nonClassDeclaration | |
| nonClassDeclaration | ::= | functionDeclaration |
| | | nonFunctionDeclaration | |
| nonFunctionDeclaration | ::= | variableDeclaration |
| | | constantDeclaration | |
| | | typeDeclaration | |
| constantDeclaration | ::= | const identifier = constant |
| typeDeclaration | ::= | type nameDeclaration |
| variableDeclaration | ::= | var nameDeclaration |
| nameDeclaration | ::= | identifier : type |
A program is a sequence of declarations. There are five types of declarations, constants, types, variables, classes, and functions. Constants must represent values known at compile time.
| classDeclaration | ::= | class identifier classBody |
| classBody | ::= | begin { nonClassDeclaration ; } end |
| functionDeclaration | ::= | function identifier arguments returnType functionBody |
| arguments | ::= | ( argumentList ) |
| argumentList | ::= | nameDeclaration { , nameDeclaration } |
| | | ε | |
| returnType | ::= | : type |
| | | ε | |
| type | ::= | identifier |
| | | ^ type | |
| | | [ integer : integer ] type | |
| functionBody | ::= | { nonClassDeclaration ; } compoundStatement |
A class creates a new type name and a name scope. Classes can contain both data fields and member functions. Functions can define optional return values. A function body consists of zero or more declarations followed by a compound statement. The identifier in the type nonterminal must represent a class name, or a defined type name. The latter can be a predefined type ( int, or real) or a name that has appeared earlier in a type declaration. The ^ operator in a type declaration indicates a pointer type.
| compoundStatement | ::= | begin { statement ; } end |
| statement | ::= | returnStatement |
| | | ifStatement | |
| | | whileStatement | |
| | | compoundStatement | |
| | | assignOrFunction | |
| returnStatement | ::= | return |
| | | return ( expression ) |
|
| ifStatement | ::= | if expression then statement |
| | | if expression then statement else statement | |
| whileStatement | ::= | while expression do statement |
| assignOrFunction | ::= | reference = expression |
| | | reference ( parameterList ) | |
| parameterList | ::= | expression { , expression } |
| | | ε |
There are six different types of statements. The else clause in an if else statement always joins with the closest surrounding if statement. The value in the return statement must match the type of the surrounding function declaration. The expressions in the test portions of the if or while statements must be boolean. The expression on the right side of an assignment must be either integer, floating point, or pointer type.
| expression | ::= | relExpression { logicalOperator relExpression } |
| relExpression | ::= | plusExpression |
| | | plusExpression relationalOperator plusExpression | |
| plusExpression | ::= | timesExpression { additionOperator timesExpression } |
| timesExpression | ::= | term { multiplicationOperator term } |
| term | ::= | ( expression ) |
| | | not term |
|
| | | new type |
|
| | | - term | |
| | | reference | |
| | | & reference | |
| | | reference ( parameterList ) | |
| | | constant | |
| reference | ::= | identifier |
| | | reference ^ | |
| | | reference . identifier | |
| | | reference [ expression ] |
The new operator returns pointer to a heap-based value. (We have no delete operator, so memory is not recovered). The constant types are integer, real and string. The ampresand operator returns the address of the corresponding reference. The ^ symbol used as an operator on a reference dereferences a pointer value. The dot notation is used to access both data fields and member functions in a class structure.