These notes are from the Week 7 of the Compilers course by Stanford. This focuses on the discussion about code generation. The architecture chosen for this class is a Stack Machine architecture on top of the MIPS architecture.
The MIPS processor supports the following instructions:
lw reg1 offset reg2: Load a 32-bit word from the address
reg2 + offsetinto
add reg1 reg2 reg3:
reg1 = reg2 + reg3
sub reg1 reg2 reg3:
reg1 = reg2 - reg3
beq reg1 reg2 label: branch to
sw reg1 offset(reg2): Store 32-bit word in
reg2 + offset
addiu reg1 reg2 imm:
reg1 = reg2 + imm. Where
immis an immediate value expressed in code. Differs from
addas it doesn’t check overflows
li reg imm:
reg = imm
jal label: Jump to label and save the address of the next instruction in
jr reg: Jump to address in register
A stack machine can be represented as follows in MIPS:
li $a0 7 # acc = 7 sw $a0 0($sp) # push acc addiu $sp $sp -4 li $a0 5 # acc = 5 lw $t1 4($sp) # acc = acc + top_of_stack add $a0 $a0 $t1 pop # addiu $sp $sp 4
$a0 is the accumulator and
$sp is the stack pointer. Also, it is worth noting that the stack grows from high addresses to low addresses.
We’ll look at generating code for the following language:
P := D; P | D D := def id(ARGS) = E; ARGS := id, ARGS | id E := int | id | E1 + E2 | E1 - E2 | if E1 = E2 then E3 else E4 | id(E1, ..., En)
The code generation for an identifier can be expressed as follows:
cgen(i) = emmit(li $a0 i)
Also, the code generation for the addition and subtraction can be built on top of that:
cgen(e1 + e2) = cgen(e1) emmit(sw $a0 0($sp)) emmit(addiu $sp $sp -4) cgen(e2) emmit(lw $t1 4($sp)) emmit(add $a0 $t1 $a0) emmit(addiu $sp $sp 4) cgen(e1 - e2) = cgen(e1) emmit(sw $a0 0($sp)) emmit(addiu $sp $sp -4) cgen(e2) emmit(lw $t1 4($sp)) emmit(sub $a0 $t1 $a0) emmit(addiu $sp $sp 4)
Besides those operations, we can define an
if-else expression as follows:
cgen(if e1 = e2 then e3 else e4) = cgen(e1) emmit(sw $a0 0($sp)) emmit(addiu $sp $sp -4) cgen(e2) emmit(lw $t1 4($sp)) emmit(addiu $sp $sp 4) emmit($a0 $t1 true_branch) emmit(false_branch:) cgen(e4) emmit(b end_if) emmit(true_branch:) cgen(e3) emmit(endif)
Furthermore, we have the following code for function definitions, function calls, and variable references. For this, we’ll need a new register
$fp that will hold the frame pointer of the current activation.
The design chosen for the activation record, or frame, is the following:
So next, we are going to examine how to generate code for a function call. It’s worth noting, that the following code snippet is executed on the caller-side.
cgen(f(e1, ..., en)) = emmit(sw $fp 0($sp)) emmit(addiu $sp $sp -4) cgen(en) emmit(sw $fp 0($sp)) emmit(addiu $sp $sp -4) ... cgen(e1) emmit(sw $fp 0($sp)) emmit(addiu $sp $sp -4) emmit(jal f_entry)
Now, to generate code for the function definitions, we can do the following:
cgen(def(x1, ..., xn) = e) = emmit(f_entry:) emmit(move $fp $sp) emmit(addiu $sp $sp -4) cgen(e) emmit(lw $ra 4($sp)) emmit(addiu $sp $sp z) emmit(lw $fp 0($sp)) emmit(jr $ra)
Note that in the previous example we used the
$ra register, which holds the value of the next address after the
jal instruction. Also, the
z variable is defined as
4 * n + 8, which is space enough for the
n variables, the return address, and the old
Finally, the variables can be defined as an offset from the current
cgen(xi) = emmit(lw $a0 z($fp))
In this case,
z can be defined as
4 * i.
In closing, the developers of production compilers usually try to keep as much data as possible in the registers for performance purposes, differently from the previous implementation of the stack machine.
One way to solving this is the use of temporaries, which are a pre-allocated region in the activation record that can be used to store intermediate results of operations and avoid back and forth with the stack.
Semantics is a way to describe a programming language without the details of the low-level language the compiler is going to be written with. For example, the assembly language needs many details to work properly such as how the stack grows or how the frame is implemented.
One way to specify programs without implementation details is what is called Operational Semantics, which is basically to describe how the code would work in an abstract machine.