Interprocedural Path Profiling and the Interprocedural Express-Lane Transformation

David Gordon Melski
University of Wisconsin

The contributions of this thesis can be broadly divided into two categories: we present novel path-profiling techniques, and we present techniques for performing the express-lane transformation, a program transformation that duplicates frequently executed paths in the hope that better data-flow facts result along those paths.

In path profiling, a program is instrumented with code that counts the number of times particular finite-length path fragments of the program's control flow graph are executed. This thesis presents a number of extensions to the intraprocedural path-profiling technique of Ball and Larus. Several of our techniques collect information about interprocedural paths (i.e., paths that cross procedure boundaries). We show that the overhead of our techniques is not prohibitive (300--700%), and that they often capture more information than the Ball-Larus technique.

The express-lane transformation isolates and duplicates hot paths in a program, aiming for better data-flow facts along the duplicated path. We describe several variants of the interprocedural express-lane transformation, each of which duplicates hot paths from an interprocedural path profile. We show that an interprocedural express-lane transformation helps range analysis to determine the outcome of 0--7% more branches than the intraprocedural express-lane transformation and 1.5--19% more branches than performing no transformation.

Code growth is one drawback of the express-lane transformation. When a pair of duplicate control-flow vertices have the same data-flow facts, it is desirable to eliminate one of the vertices (e.g., by coalescing the duplicate vertices). We present several effective techniques for eliminating duplicated code that has a redundant data-flow solution; this helps to control code growth.

We also present experimental results for program optimizations that are based on: (1) performing an express-lane transformation; (2) performing range analysis; and (3) replacing decided branches and constant expressions. We show that when used with the intraprocedural express-lane transformation, this strategy leads to larger performance benefits than previously reported (0.7--13.0%). Using the interprocedural express-lane transformation also leads to performance benefits, although usually not enough to offset the costs incurred by the transformation. It is likely that a better implementation would lower these costs, possibly leading to a net performance gain.

(Click here to access the paper: Postscript, PDF.)