Each instruction node in an abstract syntax tree holds a SUIF instruction. See section Instructions and Expression Trees. Most SUIF instructions perform simple operations; the opcodes resemble those for a typical RISC processor. However, more complex instructions are used in places where it is important to retain high-level information.
SUIF supports both expression trees and flat lists of instructions. In an expression tree, the instructions for an expression are all grouped together. This works well for high-level passes. Because expression trees do not totally order the evaluation of the instructions, they do not work so well for back-end optimization and scheduling passes. Thus SUIF also provides the flat list representation where each instruction node contains a single instruction.
Most SUIF instructions use a "quadruple" format with a destination
operand and two source operands; however, some instructions require more
specialized formats. For example, ldc
(load constant)
instructions have an immediate value field in place of the source
operands. While most SUIF instructions are very simple, it is important
to retain sufficient high-level information to support detailed
analysis. Thus SUIF includes several instructions with more complex
behavior:
cal
(call) instruction implements a machine-independent
procedure call with a list of parameters. This hides the details of
various linkage conventions.
mbr
(multi-way branch) instruction transfers control to one
of the given labels depending on the value of its source operand. This
represents computed goto
and switch
statements.
array
instruction computes the address of an element in an
array given a list of index values. These instructions are eventually
expanded to additions and multiplications to perform the necessary
pointer arithmetic.
These instructions are much easier to analyze than the equivalent series of low-level instructions.
The following example shows the low-SUIF instructions corresponding to a simple fragment of C code:
x = 0; y = *ptr; if (y < x + z) { *ptr = x; }
1: ldc (i.32) x = 0 // load integer constant 0 2: lod (i.32) y = ptr // load from address in ptr 3: add (i.32) nd#3 = x, z // add x and z 4: sl (i.32) nd#4 = y, nd#3 // set if y < x + z 5: bfalse nd#4, L:L1 // branch if false to label L1 6: str ptr = x // store x to address in ptr 7: lab L:L1 // label L1
Most of the operands in this example are variables; however, the results
of the add
and sl
instructions are temporary values and
are not stored in variables. Such temporary values occur frequently,
and rather than requiring that new variables be created to hold them,
SUIF allows them to be used directly. Each temporary must have a single
definition and a single use within the same basic block. In the printed
code above, the temporary values are indicated by "node" numbers,
using the ID numbers of the instructions that produce the values.
Internally, however, the operands contain pointers between the
instructions. For example, the bfalse
source operand contains a
pointer to the sl
comparison instruction, and the sl
destination operand contains a pointer to the branch instruction. Thus
the definition and use of a temporary value are directly connected,
making it easy to find one from the other.
Flat lists of instructions work well for many back-end compiler passes,
but for high-level transformations expression trees are often a better
representation. Thus the SUIF system supports expression trees as well
as flat lists. See section Expression Trees. The difference between the two
representations is actually quite small. The temporary value pointers
described above naturally create trees, except that with flat lists the
nodes of the trees are listed in bottom-up order. When using the
expression tree representation, the instructions are rearranged so that
only the roots of the expression trees are included in the AST
instruction nodes. The other instructions are reached through the
pointers in the operands. For example, by labeling each subtree (e.g.
e1
and e2
), the expression tree in the example could be
printed as:
5: bfalse e1, L:L1 4: e1: sl (i.32) y, e2 3: e2: add (i.32) x, z
In this case, only the branch instruction is directly contained in an
instruction node. The sl
instruction is reached through the
branch's source operand, and the add
instruction is contained in
the second source operand of the sl
instruction.