Control-Flow Recovery from Partial Failure Reports

This research was conducted by Peter Ohmann, Alexander Brooks, Loris D’Antoni, and Ben Liblit. The paper appeared in the ACM SIGPLAN 2017 Conference on Programming Language Design and Implementation (PLDI 2017).

Abstract

Debugging is difficult. When software fails in production, debugging is even harder, as failure reports usually provide only an incomplete picture of the failing execution. We present a system that answers control-flow queries posed by developers as formal languages, indicating whether the query expresses control flow that is possible or impossible for a given failure report. We consider three separate approaches that trade off precision, expressiveness for failure constraints, and scalability. We also introduce a new subclass of regular languages, the unreliable trace languages, which are particularly suited to answering control-flow queries in polynomial time. Our system answers queries remarkably efficiently when we encode failure constraints and user queries entirely as unreliable trace languages.

Full Paper

The full paper is available as a single PDF document. A suggested BibTeX citation record is also available.

Erratum

The PDF offered above corrects a minor error in Figure 1. Node D was shown correctly in the program’s control-flow graph but was missing from the source code skeleton. Node D should properly appear in main(), immediately after the call to foo():

/* ..C..*/
foo();
/* ..D.. */

Special thanks to Zhang Yushan for bringing this error to our attention.

Supplemental Material

We are pleased to offer the following supplemental material in support of the main paper: