Overview

Device drivers are a major source of complexity, unreliability, and cost for modern operating systems. As evidence, drivers account for the majority of system crashes: Microsoft reports that 89% of Windows XP crashes are caused by device drivers, and Linux driver code had up to seven times the bug density of other kernel code. The objective of this research is to improve device drivers by (1) reducing the complexity and cost of implementing device drivers, (2) improving the fault tolerance of device drivers, and (3) improving the performance of device drivers on modern hardware and software architectures.

Driver Static Analysis

Static analysis provides a useful technique for understanding and manipulating large bodies of driver code. We built the DriverSlicer tool, using CIL, to analyze and modify driver code. The tool was originally developed as part of the microdriver project described below, but we have since applied it to two more projects.

Carburizer

Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load. We built Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement.


Driver Study

We study the source code of Linux drivers to understand what drivers actually do, how current research applies to them and what opportunities exist for future research. We develop a set of static-analysis tools to analyze driver code across various axes. We found that many assumptions made by driver research do not apply to all drivers. At least 44% of drivers have code that is not captured by a class definition, 28% of drivers support more than one device per driver, and 15% of drivers do significant computation over data. From the driver interactions study, we find that the USB bus offers an efficient bus interface with significant standardized code and coarse-grained access, ideal for executing drivers in isolation. We also find that drivers for different buses and classes have widely varying levels of device interaction, which indicates that the cost of isolation will vary by class. Finally, from our driver similarity study, we find 8% of all driver code is substantially similar to code elsewhere and may be removed with new abstractions or libraries.


Microdrivers and DriverSlicer

A major difficulty in writing kernel driver code is the many unenforced rules required by the kernel. Some examples from Windows include:
  • Functions that block may not be called at high priority levels or deadlock may occur.
  • Locks provide mutual exclusion above a certain priority level but not below. For example, KeAcquireSpinLockForDpc synchronizes all callers except interrupt handlers.
  • Code executing at high priority may not access pageable memory, because page faults cannot be satisfied.
  • Addresses passed from applications are only accessible when executing on a thread from the application's process but not on kernel worker threads, such as during a timer callback.
Coding at user level simplifies driver development because these rules do not apply. In addition, there are many more software engineering tools and programming languages available that further aid programmers.
We developed a novel hybrid approach to building drivers that provides both high performance and compatibility. Rather than execute all driver code at user level, we propose to extract a kernel-level microdriver from existing driver code. The microdriver contains only the code required for high-performance and to satisfy OS requirements. We convert the remaining code to a userdriver that executes in a user-level process. Shared data is marshaled and copied between the two portions on function calls. To maintain compatibility with existing code, we created DriverSlicer, a tool to semi-automatically partition drivers.

Decaf Drivers

Decaf Drivers takes a best-effort approach to simplifying driver development by allowing most driver code to be written at user level in languages other than C. Decaf Drivers sidesteps many of the above problems by leaving code that is critical to performance or compatibility in the kernel in C. All other code can move to user level and to another language; we use Java for our implementation, as it has rich tool support for code generation, but the architecture does not depend on any Java features. The Decaf architecture provides common-case performance comparable to kernel-only drivers, but reliability and programmability improve as large amounts of driver code can be written in Java at user level.

Nooks

Nooks is a reliability subsystem that seeks to greatly enhance OS reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to existing driver and system code. To achieve this, Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to hasten automatic clean-up during recovery.

Shadow Drivers

We extended Nooks with shadow drivers to recover from driver failures. A shadow driver is a kernel agent that (1) conceals a driver failure from its clients, including the operating system and applications, and (2) transparently restores the driver back to a functioning state. In this way, applications and the operating system are unaware that the driver failed, and hence continue executing correctly themselves.

Support

This work is supported in part by National Science Foundation (NSF) grants CNS-0915363 and CNS-0745517 and a grant from Google.

Publications

Presentations

  • Software Support for Improved Driver Reliability talk at UMass, UT-Austin, 2009-2010. (pdf)
  • Decaf: Moving Device Drivers to a Modern Language talk at USENIX, June 2009. (pdf)
  • The Design and Implementation of Microdrivers talk at ASPLOS, March 2008. (pdf)
  • Improving the Reliabibility of Commodity Operating Systems talk given at UIUC ACM Reflections/Projections Conference, October 2006. (pdf)
  • Improving the Reliabibility of Commodity Operating Systems job talk given at various places in 2005 (pdf)
  • Recovery Device Drivers talk at OSDI 2004, December 2004. (pdf)
  • Recovering Device Drivers, or Cleaning Up Nooks talk in UW class CSE551: Graduate Operating Systems (pdf)
  • Shadow Drivers: Transparent Recovery for Kernel Extensions poster at UW industrial Affiliates, February 2004 (pdf)
  • Improving the Reliability of Commodity Operating Systems talk at SOSP 2003, October 2003 (pdf)
  • Nooks poster at UW industrial Affiliates, February 2003 (pdf)
  • Nooks: an architecture for reliable device drivers talk at ACM SIGOPS worksop, September 2002(ppt)
  • Nooks: an architecture for reliable device drivers talk at UW Networking and Systems Retreat, June 2002 (ppt)
  • Michael M. Swift

    Michael M. Swift

    Professor
    Computer Sciences Department
    College of Letters and Sciences
    University of Wisconsin, Madison


    Contact Information

    608-890-0131
    swift at cs dot wisc dot edu

    7369 Computer Sciences
    Computer Sciences Department
    University of Wisconsin-Madison
    1210 West Dayton Street
    Madison, WI 53706-1685 USA