Interprocedural Path Profiling and the Interprocedural
Express-Lane Transformation
David Gordon Melski
The contributions of this thesis can be broadly divided into two categories: we
present novel path-profiling techniques, and we present techniques for performing
the express-lane transformation, a program transformation that duplicates
frequently executed paths in the hope that better data-flow facts result along
those paths.
In path profiling, a program is instrumented with code that counts the number of
times particular finite-length path fragments of the program's control flow
graph are executed. This thesis presents a number of extensions to the
intraprocedural path-profiling technique of Ball and Larus. Several of our
techniques collect information about interprocedural paths (i.e., paths
that cross procedure boundaries). We show that the overhead of our techniques
is not prohibitive (300--700%), and that they often capture more information
than the Ball-Larus technique.
The express-lane transformation isolates and duplicates hot paths in a program,
aiming for better data-flow facts along the duplicated path. We describe
several variants of the interprocedural express-lane transformation, each of
which duplicates hot paths from an interprocedural path profile. We show that
an interprocedural express-lane transformation helps range analysis to determine
the outcome of 0--7% more branches than the intraprocedural express-lane
transformation and 1.5--19% more branches than performing no transformation.
Code growth is one drawback of the express-lane transformation. When a pair of
duplicate control-flow vertices have the same data-flow facts, it is desirable
to eliminate one of the vertices (e.g., by coalescing the duplicate vertices).
We present several effective techniques for eliminating duplicated code that has
a redundant data-flow solution; this helps to control code growth.
We also present experimental results for program optimizations that are based
on: (1) performing an express-lane transformation; (2) performing range
analysis; and (3) replacing decided branches and constant expressions. We show
that when used with the intraprocedural express-lane transformation, this
strategy leads to larger performance benefits than previously reported
(0.7--13.0%). Using the interprocedural express-lane transformation also leads
to performance benefits, although usually not enough to offset the costs incurred
by the transformation. It is likely that a better implementation would lower
these costs, possibly leading to a net performance gain.
(Click here to access the paper: Postscript, PDF.)
University of Wisconsin