This research was conducted by Piramanayagam Arumuga Nainar, Ting Chen, Jake Rosin, and Ben Liblit. The paper appeared in the 2007 International Symposium on Software Testing and Analysis (ISSTA 2007).
Statistical debugging uses dynamic instrumentation and machine learning to identify predicates on program state that are strongly predictive of program failure. Prior approaches have only considered simple, atomic predicates such as the directions of branches or the return values of function calls. We enrich the predicate vocabulary by adding complex Boolean formulae derived from these simple predicates. We draw upon three-valued logic, static program structure, and statistical estimation techniques to efficiently sift through large numbers of candidate Boolean predicate formulae. We present qualitative and quantitative evidence that complex predicates are practical, precise, and informative. Furthermore, we demonstrate that our approach is robust in the face of incomplete data provided by the sparse random sampling that typifies post-deployment statistical debugging.
The full paper is available as a single PDF document. A suggested BibTeX citation record is also available.