W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler Technology for Future Microprocessors. Proceedings of the IEEE 83(12), December 1995. IEEE Xplore link |
Compilers improve program exec speed by
eliminating unnec instr
putting stuff used most in registers. (reduce memory accesses).
Make use of hardware parallelism requried.
To parallelise a program (ILP compilation)
Dependency extraction
ILP optimizations
Core Scheduling (reorder the instructions)
Definitions (and why)
basic blocks : control in control out block
Control flow graph : control dependence
Register data dependence : flow dependence, anti-dependence (WAW), loop-independence (within the same iterations in a loop), loop-carried (over the loop boundary) == register output dependence.
Memory data dependence : loop independent memory anti-dependence : inside loop unnecessary wait due to overwrite fear.
loop carried memory antidependence. memory input dependence : 2 instr read from the same memory location (can remove redundant loads)
Go beyond basic blocks : Scheduling
Superblock : formed from the trace by dupilcating blocks to remove merge points and their associated complications from the trace.
reorder instr with the superblock : global acyclic (in the control flow graph) scheduling.
Speculative code motion : move instructions.
Techniques : Register renaming/Loop unrolling / software pipelining (cyclic scheduling).
for (i=1) to bignumber A(i); B(i); C(i) end >> A(i) A(i+1) A(i+2) B(i)... in software.
Predicated Execution
Transformations of conditional branches into predicates : control depencences > to data dependences.
>>> Predicated exec allows the compiler to trade instr fetch efficiency for the capability to expose ILP to the hardware along multiple execution paths.
Hyperblock : connected basic blocks where control may only enter in the first block, and exit anywhere.
(tradeoff : branch performance with resource usage, hazard conditions and critical path lengths).
Branch combining replaces a group of exit branches with a corresponding group of predicate define instr. now OR-type sematics can be used, once you hit the exit code, find out why you got to the exit code, jump to that exit destination.
Loop peeling : compiler peels away first several iterations of a loop and combined with the outer loop.
Control height reduction : dependence chain length to compute execution contidtions of instr