These notes are from the Week 8 of the Compilers course by Stanford.
An intermediate language provides an intermediate level of abstraction while compiling languages, helping optimizations by including more details than the source language has, but not as much as the target language.
Code optimization is usually done before code generation in intermediate code, to avoid being dependant on one type of machine. Those optimizations typically work on groups of statements, one of them is called basic block which is a maximal sequence of instructions with no labels nor jumps.
There are two types of optimizations local and global, for languages like C. The local optimization applies only to basic blocks whereas global applies to the control-flow graph.
For local optimization, some instructions can be deleted while some statements can be simplified:
// can be deleted x := x + 0 x := x * 1 // can be simplified x := x * 0 ==> x := 0 y := y ** 2 ==> y := y * y x := x * 8 ==> x := x << 3 // valid only if bitshift is faster
Also, operations of constants can be computed in compile-time, which is called constant folding. This constant folding has some subtleties when cross-compiling programs across different architectures, like when dealing with floating-point numbers.
Besides that, unreachable basic blocks can be deleted from the program. This might take benefit from memory cache and also reduces the size of the code. For example, libraries and macros. The programs do not usually consume all features of libraries, which enables the compiler to remove unused code during optimization. Macros might lead to constant expressions that evaluate a value which makes code unreachable, and the whole basic block can be removed.
Furthermore, there is another optimization which is common subexpression optimization and consists of detecting duplicate expressions and replace them with their first evaluation. This is only possible if the code does not reuse registers, which in turn is another type of optimization called single assignment form.
In addition, some types of optimizations are only useful because they allow further optimization in code. One example is the copy propagation optimization which detects assignments to other registers and later usages of them and replaces them with the original register. For example:
b := z + y b := z + y a := b a := b x := 2 * a x := 2 * b
This can allow constant folding and dead code elimination.
During the compilation, all those optimizations may run and every time one of them runs, this process starts again until no further optimization is possible in code.
The control flow of the program is a graph that has basic blocks as nodes and models all paths of execution of the code. For example, constant propagation can be done globally if and only if all paths yield to the same constant value. As computing all paths is expensive, there are some approximation techniques that can have satisfactory results.