CMPS 450 Compiler Front-End Project

PHASE III "Scope Checking and Intermediate Code Generation"

llvm compiler infrastructure
gdb basics
elfdump
PhaseIII_files
cflat BNF

In Phase III of your compiler front-end project you will

  • 1) implement scope checking
  • 2) perform intermediate code generation

    The lexicon, syntax and semantics of Phase I and Phase II are unchanged.

    SCOPE CHECKING & TYPE BINDING

    For PhaseIII it is assumed that your Phase II scanner successfully performs name mangling and correct binding of a reference to a variable as shown in 90.cf file from PhaseII:
    /* 90.cf - type checks paramater list - name mangling required */
    int x;
    float num;
    int foo ( int num, float y )
    {
        num = 7;          /* OK */
        num = y;          /* line 10 type error */
        y = 4.5 + 8 * 2;  /* line 11 type error */
        return 0;
    } 
    If the binding is done correctly all references to num and y should be local. The symbol table dump will reflect this:
    ------------ ------ ------------
    Name         Type   Line_Numbers
    ------------ ------ -------------
    foo0          1      7
    num0          2      5
    num1          1      7    9   10
    x0            1      4
    y1            2      7   10   11  
    The scope semantics that you will add to your parser for phase III checks for multiple declarations of the same identifier in the same scope and a reference to an identifier that has not been declared. The parser will spit out an error for these errors and continue parsing. The code below demonstrates these two types of errors:
    int   x =  5;    
    float x = 5.5;  /* identifier declared twice */
    int foo ( ) 
    {
        w = 5;    /* reference to an identifier not declared */
        return 0;  
    }  
    Since cflat does not support nested scope (e.g., nested blocks in C) one symbol table is sufficient for the two possible scopes which are global or local to a function. Nested scope generally requires multiple symbol tables. The scope checking requirements for phaseIII are covered in lab07. Your parser will keep track of the current scope as you parse by a global variable that is visible to the scanner. As the scanner adds the identifier to the symbol table the scanner appends the scope flag for that identifier. You cannot make a reference to an identifier that has not been declared:
        /* file: foo.cf */
        foo() {
          i = 5;  // scope error
        } 
    And you cannot declare an identifier twice in the same scope:
        int i = 7;
        int i = 9;  // scope error 

    65%

    For 65% run your parser against 65.cf file. Your parser should find two scope errors. The log file and symtab dump should be as follows:
    i0 declared twice, line 5
    w0 not declared, line 8
    Type err:exp->exp op term,line 11
    Type err: assign_stmt, line 15
    
    ------------ ------ ------------
    Name         Type   Line_Numbers
    ------------ ------ -------------
    foo0          1      6 
    f0            2      3   11   15 
    i0            1      4    5    9    9   10   15 
    j1            1      6   14   16 
    w0           -1      8 
    x0            1      2 
    x1            2      6   13 
    Note that the reference to x on line 13 is correctly bound to the formal parameter x and not to global x.

    INTERMEDIATE CODE GENERATION

    The final deliverable for Phase III is a cflat compiler that generates 3-address code (TAC) as the intermediate code. I'm flexible on the form of intermediate code - if you want to translate your program into something other than 3-address code that is OK with me. The easiest option is 3-address code (TAC) since the algorithms for generating the code are covered in the text. The text provides the semantic actions required to generate the code while you are traversing the syntax tree; i.e, during parsing. You do not need to actually generate an abstract syntax tree unless you want to. You may find it easier to create the tree and then generate the code while traversing the tree.

    Since a compiler will generate intermediate code only after the source has passed both the syntax and semantic analysis phases you can assume you are working with error-free code during all remaining parts of phase III.

    75%

    You will use two additional fields in the symbol table record for 75%. One field (s_value) holds the address to the data and the other field (s_size) holds the size of data. Your symbol table is getting closer to what is used in production languages. A sample symbol table record is shown in this elfdump for a Solaris C ELF (Executable and Linking Format) executable.
       index      value       size  type bind oth ver shndx    name
       [68]  0x000208d0 0x00000004  OBJT GLOB  D    0 .data   intConstant 

    For 75% run your parser against 75.cf file. This file contains declarations and a function with a parameter list. The "code" that you need to generate for declarations is to insert into the symbol table a size and an address offset from 100 for each global variable. Parameters are not given an address since they are generally loaded into registers. For your globals assume each offset is computed starting at physical address 100 to make your table easier to read. The offset will be added to a memory pointer of the .data segment when the code is translated into machine code by the backend. Assume sizes: int (4 bytes) float (8 bytes) and char (1 byte). Align your data on a word (4-byte) boundary. Your symbol table hash record already has the fields you need (st_size and st_value) but you will need to add the functions to insert these values. After parsing this file

    /* file: 75.cf generates code for declaration and parameter lists by storing 
       the size and address for globals as an offset in the symbol table */
    float f; 
    int i; 
    char c;
    int foo ( int arg1, float arg2 )
    {
        return 0;
    }
    After parsing your symbol table dump should look something like this :
    ------------ ------ ---- ----- ------------
    Name         Type   Size Value Line_Numbers
    ------------ ------ ---- ----- ------------
    foo0         int       4    0   6 
    c0           char      1  112   5 
    f0           float     8  100   3 
    i0           int       4  108   4 
    arg11        int       4    0   6 
    arg21        float     8    0   6 
    Notice that the parameters and the return value do not have addresses. Parameters and local variables will be loaded into registers or pushed onto the runtime stack.

    85%

    For 85% run your parser against the 85.cf file. This file tests your compiler's ability generate 3-address code for assignments and expressions. The code
         float y = 5.5;
    char c;
    int ret = 0;
    int foo ( float f, int i) 
    {
        i =  i + 1;
        f = y * 7.0 + 3.5;  
        return ret;  
    } 
    Generates this code:
    y0 = 5.5 
    ret0 = 0 
    t1 = i1 + 1
    i1 = t1 
    t2 = y0 * 7.0
    t3 = t2 + 3.5
    f1 = t3 
    Temporaries such as t1, t2, and t3 are treated as objects in your program so you need to add each temporary to the symbol table. A temporary will have a type, size and offset just like data objects. To compute type you need to synthesize the type from the LHS and RHS of the operation. Sample symbol table dump:
    ------------ ------ ---- ----- ------------
    Name         Type   Size Value Line_Numbers
    ------------ ------ ---- ----- ------------
    foo0         int       4    0   9 
    c0           char      1  108   7 
    f1           float     8    0   9   12 
    ret0         int       4  112   8   13 
    t1           int       4    0   0 
    t2           float     8    0   0 
    t3           float     8    0   0 
    i1           int       4    0   9   11   11 
    y0           float     8  100   6   12 
    

    95%

    For 95% run your parser against 95.cf file. This file tests your compiler's ability to generate 3-address code for IF-ELSE and WHILE statements.

    100%

    For 100% run your parser against 100.cf file. This file tests your compiler's ability to generate 3-address code for a function call. Note that a function definition outside of main will need to be added to your grammar as well as the function call.
    NOTES: If you are interested in building a front end for the GNU Compiler Collection (GCC) as a senior project, read through the gcc frontend, visit the GCC wiki or GNU GCC wiki. To work with GCC your compiler must ultimately generate RTL (register transfer language) which is the intermediate language that gcc uses.