BAMBI compiler
We chose to have a compiler that translates C programs into BAMBI assembly. To do this we ported the GNU C Compiler suite (GCC) to our architecture. GCC is already able to compile programs for around forty other processor architectures so it has a fairly well defined interface to add support for new architectures. We used the PowerPC and the SPUR ports as examples in the porting process. There are two main files which define each architecture to GCC. The first is a machine description file (.md) which describes to the compiler in basic terms what every instruction does, and sometimes tells the compiler how to combine instruction sequences to perform higher level operations. The second file is a C header which describes the machine environment and attributes (word width, byte ordering, available registers, etc..) to the compiler.
Function Emulation
GCC requires several operations that the BAMBI architecture does not define, most notable multiply and divide (__mulsi3 and __divsi3.) These were provided with some functions borrowed from a flavor of the MIPS port intended for embedded processors.
Files
To port GCC to the bambi architecture the
following files were added to the standard binutils-2.9.1 directory:
- gcc/config/ece554/ece554.h
-
This file contains a large number of macros which contain descriptions of the BAMBI environment. The register set, the machine word length, the alignment requirements, and hundreds of other things to help the compiler produce correct code.
-
gcc/config/ece554/ece554.c
-
This file contains code used to output various constructs during compilation. Most notably branch instructions and function prolog and epilogue.
-
gcc/config/ece554/ece554.md
-
Contains logical translations between RTL (register transfer language) and machine assembly. This file does not describe system level instructions such as 'rfi' and 'mtspr'
-
gcc/config/ece554/crt0.S
-
The code in this file is linked into every user-level binary, it sets up the execution environment for every process (initializes the stack pointer, and anything else that might need to be done) and then calls main().
RTL and the .md file
GCC separates C compilation from assembly output using a language called RTL. The output of the C parser is a description of the program in RTL. The RTL program is then converted to a particular machine's architecture using the RTL descriptions provided by the port, these translations exist in the file ece554.md. Compilation from C to RTL is often affected by which RTL operators are provided by the machine description, or how efficient any given one might be. Most of the optimization the compiler does is done on the RTL form of the program. This allows us to have not only a C compiler, but a fairly advanced _optimizing_ compiler without having to write one from scratch.
RTL looks like this:
(set
(match_operand:SI 0 "register_operand" "=r")
(not:SI
(match_operand:SI 1 "register_operand" "r")))
This particular statement says "set register operand 0 to the logical inverse of register operand 1." RTL is defined in terms of registers: the movement and operations that can be performed between them. Registers here are abstract notions, so can also be memory locations.
GCC has a certain minimal set of instructions which must be contained in a machine description. Some of these instructions, such as the one's compliment instruction shown above, must be written as a compound of other instructions. The full entry in the ece554.md file to describe how to generate ones compliment assembly is this:
;
; bitwise inversion of reg 1
;
(define_insn "one_cmplsi2"
[(set (match_operand:SI 0 "register_operand" "=r")
(not:SI
(match_operand:SI 1 "register_operand" "r")))]
""
"sub %0,r0,%1\;subi %0,%0,1")
A line by line explanation of what this says:
Tell the compiler we are defining one of its needed functions: "one_cmplsi2"
(define_insn "one_cmplsi2"
The RTL for this instruction is contained in [square brackets]
The first line says "set the 0th operand, which must be a register one word wide (SI)"
[(set (match_operand:SI 0 "register_operand" "=r")
The value which it is being set to is the result of a 'not' operation which returns a word-wide value (SI)
(not:SI
The not is performed on operand #1 (the second operand) which also must be in a register and be one word wide. This is the end of the RTL description.
(match_operand:SI 1 "register_operand" "r")))]
Then there is a field which allows us to disable this instruction in case we are compiling for a variant of the architecture which does not have this instruction. We leave this blank because there is only one variant of the BAMBI architecture.
""
And finally a free form printf-like string to tell the compiler how to write the instruction in assembly. When this is actually output, the fields "%0" and "%1" will be replaced with the correct register names (the names of operands 0 and 1) by the compiler. The "\;" indicates the beginning of a new line of code.
"sub %0,r0,%1\;subi %0,%0,1")
(Note: This is the fastest way to do a bitwise inversion of an entire register in BAMBI because xori does not sign extend its immediate.)
Compilation Example
The following C function demonstrates simple use of the ones complement operation that has been described:
int ones_complement(int dummy, int a)
{
return ~a;
}
Functions are defined to pass variables in registers 3-12, and their return value always goes in register 3. The dummy vairable is used here to force the incoming "operand 0" to be in register 4, while the return value must go into register 3. Here is the assembly generated for this function:
.align 2 # force function start address to be
# aligned on 2^2 address boundary
.globl _ones_complement
_ones_complement: # label for the function start address
sw r30,4(r1) # save the return address in the caller's stack
subi r1,r1,12 # stack: size: 0b nreg: 3
sw r31,8(r1) # saving live
addi r31,r1,8 # set up frame pointer
sw r31,0(r1) # set pseudo back-chain
sub r3,r0,r4 # 11 one_cmplsi2
subi r3,r3,1
lw r31,8(r1) # restore frame pointer
addi r1,r1,12 # deallocate stack
lw r30,4(r1) # restore link reg
jalr r30
Known Bugs
- Large switch statements dont work properly. This is because GCC optimizes them into a jump table and the machine description of table jumps is not complete. This would be fairly easy to fix given more time. The work around is simply to use chained if/then/else statements.
- Compilation without optimization produces bad code. I have not yet tracked this bug down because a work around is simply to compile with optimization turned on. GCC produces terrible code without optimization anyhow. I suspect the problem lies with how local arguments are stored on the stack, so this problem might manifest for optimized code in functions with extremely large argument lists.
Possible Compiler Optimizations
- Rearrange memory allocation and access. Currently all static variables are accessed through double indirection due to memory limitations inherited from the SPUR port of GCC. Because BAMBI loads and stores are able to access the entire address space this indirection is unnecessary.
- Add weighting tags to the operations which take multiple instructions so they are not treated as single cycle instructions during optimization. GCC provides a mechanism through which to do this, but we did not take advantage of it.
- Add frame pointer elimination code. Often functions do not need a valid frame pointer because all stack access can be done relative to the stack pointer. This is port specific though, because GCC does not assume any particular stack frame layout.
- Redescribe Compare instructions. Currently compare operations and their results are used only for branching, but the BAMBI architecture allows explicit manipulation of the results of a compare. Several new entries would have to be added to the machine description file, and a new type of register to hold compare results would need to be defined in the macro header file (ece554.h.) This would be a lot of work, and would only produce marginally better code.
Indispensible Resources
- The SPUR GCC port. Even thought this port is no longer supported, and has some bugs, it is probably the architecture most like our own (very simple load/store RISC design.) Our port was based heavily on this compiler.
- The GCC Documentation. http://gcc.gnu.org/onlinedocs/gcc.html Specifically sections 18 - RTL Representation, 19 - Machine Descriptions and 20 - Target Description Macros.
configuration
mkdir build.gcc
cd build.gcc
../gcc-2.95.2/configure --target=ece554-unknown-elf --prefix=/usr/home/tesch/ece554 --with-gnu-as --with-gnu-ld --nfp --enable-languages=c --with-headers=/usr/home/tesch/ece554/src/newlib-1.9.0/newlib/libc/include
Last modified: Sun May 13 01:49:10 CDT 2001