Applications of Static Analysis and Program Structure in Statistical Debugging

Piramanayagam Arumuga Nainar


Software testing is costly, time-consuming, and often incomplete. Statistical debugging is a new domain of research that extends the testing phase into deployment. Post-deployment monitoring and statistical analyses are used to find program behaviors, called predicates, that are correlated with failures. Prior works rely on dynamic analysis, which focuses solely on the runtime behavior of programs, for bug isolation. This dissertation demonstrates the use of static analysis, a counterpart of dynamic analysis, to improve statistical debugging. Static analysis discovers a program properties without running it.

The contributions are evaluated in the context of the Cooperative Bug Isolation (CBI) project that studies statistical debugging techniques. Predicates instrumented by CBI test the program's state at particular program locations. At a high level, predicates are considered useful for fault localization if the set of executions in which they are observed true closely matches the set of failed executions. However, complex bugs manifest only when multiple factors co-occur. Simple predicates may not be accurate predictors of complex bugs. We propose that compound predicates, which are propositional combinations of simple predicates, could be better failure predictors. Compound predicates are the most accurate predicates of failure in 93% of our experiments. We use offline estimation, mathematical upper-bounds on failure predictivity, and static program dependences to tractably incorporate compound predicates into statistical debugging. These optimizations reduce analysis time from 20 minutes to just 1 minute.

Statistical debugging searches for needles in a haystack: over 99.996% of predicates are not failure predictive. The CPU, network, and storage resources used to collect these predicates could be better utilized. We develop an adaptive bug isolation technique that uses static program dependences and prior feedback to selectively monitor those predicates that are likely to be useful. Predicates that are irrelevant to failure are never instrumented, or are removed after their relevance is ascertained. We characterize this adaptive predicate selection as a forward analysis on the program-dependence graph. We also develop a backward analysis that uses a crash location as a starting point. Our approach finds the best predicate found by complete instrumentation after exploring, on average, just 40% of the program locations. More importantly, very few locations are instrumented at any time, yielding imperceptibly low overheads of less than 1%.

While debugging, a developer first uses the list of predicates in CBI's output to find root causes of failures, and then develops a fix. We aid the developer in the first task by augmenting CBI's output with source-level changes that are likely causes of failures. We extend the well-studied problem of change impact analysis to compute the likelihood that a source-level change impacts a program location. Our technique, called conditional coverage profiles, uses symbolic evaluation, the program's control-flow graph, and runtime profiles to narrow the scope of impacted program locations by an order of magnitude compared to prior work. It identifies failure-inducing changes with a precision of 89% and a recall of 55%.

Our contributions, while seemingly orthogonal in their goals, are unified by their application of static analysis to connect dynamic behavior. Static program structure is used to prune less useful compound predicates, adaptively instrument likely bug predictors, and associate predicates with failure-inducing changes. Overall, this dissertation improves the monitoring efficiency and fault localization of the state-of-the-art in statistical debugging.

Full text: pdf
Suggested Citation: BibTeX

Back to Piramanayagam's home page

Computer Sciences | UW Home