System reliability is limited by the reliability of devices. Evidence suggests that device failures cause a measurable fraction of system failures, and that most hardware failures are transient and can be tolerated in software.
Carburizer improves reliability by automatically hardening drivers against device failures without new programming languages, programming models, operating systems, or execution environments.
Carburizer FInds and repairs hardware dependence bugs in drivers, where the driver will hang or crash if the hardware fails. In addition, Carburizer inserts logging code so that system administrators can proactively repair or replace hardware that fails.
In an analysis of the Linux kernel, Carburizer identiFIed over 992 hardware dependence bugs with fewer than 8% false postives. Discounting for false positives, Carburizer could automatically repair approximately 845 real bugs by inserting code to detect when a failure occurs and invoke a recovery service. Repairs made to false positives have no correctness impact. In performance tests, hardening drivers had almost no visible performance overhead.