This paper appeared in the Seventh International Workshop on Dynamic Analysis (WODA 2009)
With the widespread deployment of multi-core hardware, writing concurrent programs has become inescapable. This has made fixing concurrency bugs (or crugs) critical in modern software systems. Static analysis techniques to find crugs such as data races and atomicity violations are not scalable, while dynamic approaches incur high run-time overheads. Crugs pose a greater challenge since they manifest only under specific execution interleavings that may not arise during in-house testing. Thus there is a pressing need for a low-overhead program monitoring technique that can be used post-deployment.
We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation technique to isolate the root causes of crugs. CCI inserts instrumentation that records occurrences of specific thread interleavings at run-time by tracking whether successive accesses to a memory location were by the same thread or by distinct threads. The overhead of this instrumentation is kept low by using a novel cross-thread random sampling strategy. We have implemented CCI on top of the Cooperative Bug Isolation framework. CCI correctly diagnoses bugs in several nontrivial concurrent applications while incurring only 2-7% run-time overhead.