Overview

Device drivers are a major source of complexity, unreliability, and cost for modern operating systems. As evidence, drivers account for the majority of system crashes: Microsoft reports that 89% of Windows XP crashes are caused by device drivers, and Linux driver code had up to seven times the bug density of other kernel code. The objective of this research is to improve device drivers by (1) reducing the complexity and cost of implementing device drivers, (2) improving the fault tolerance of device drivers, and (3) improving the performance of device drivers on modern hardware and software architectures.

Microdrivers and DriverSlicer

A major difficulty in writing kernel driver code is the many unenforced rules required by the kernel. Some examples from Windows include:
  • Functions that block may not be called at high priority levels or deadlock may occur.
  • Locks provide mutual exclusion above a certain priority level but not below. For example, KeAcquireSpinLockForDpc synchronizes all callers except interrupt handlers.
  • Code executing at high priority may not access pageable memory, because page faults cannot be satisfied.
  • Addresses passed from applications are only accessible when executing on a thread from the application's process but not on kernel worker threads, such as during a timer callback.
Coding at user level simplifies driver development because these rules do not apply. In addition, there are many more software engineering tools and programming languages available that further aid programmers. There have been attempts to execute driver code in user mode, as in a microkernel. However, current user-mode techniques suffer one of from two flaws. User-level driver frameworks that run unmodified kernel drivers suffer from poor performance because the existing kernel interface was written assuming fine-grained sharing, trusted code, and zero-cost invocations. For example, the kernel calls network drivers once for each packet, rather than batching a set of packets into a single call. In contrast, user-mode driver frameworks with good performance require rewriting drivers to a new interface to avoid these inefficiencies. Furthermore, some user-level driver systems limit support to devices that do not require DMA or interrupt handling. Considering the large base of existing drivers, these problems limit the usefulness of current user-mode driver frameworks. We propose a novel hybrid approach to building drivers that provides both high performance and compatibility. Rather than execute all driver code at user level, we propose to extract a kernel-level microdriver from existing driver code. The microdriver contains only the code required for high-performance and to satisfy OS requirements. We convert the remaining code to a userdriver that executes in a user-level process. Shared data is marshaled and copied between the two portions on function calls. To maintain compatibility with existing code, we will create DriverSlicer, a tool to semi-automatically partition drivers. This architecture resembles network routers, in which dedicated processors perform high-speed switching while complicated routing and error handling are left to separate control processors. We leave the code for transmitting data to and from a device in the microdriver, which ensures high performance. We move the code for initializing and configuring the device, error handling, and reporting statistics out of the kernel and into user mode.

Decaf Drivers

Decaf Drivers takes a best-effort approach to simplifying driver development by allowing most driver code to be written at user level in languages other than C. Decaf Drivers sidesteps many of the above problems by leaving code that is critical to performance or compatibility in the kernel in C. All other code can move to user level and to another language; we use Java for our implementation, as it has rich tool support for code generation, but the architecture does not depend on any Java features. The Decaf architecture provides common-case performance comparable to kernel-only drivers, but reliability and programmability improve as large amounts of driver code can be written in Java at user level. The goal of Decaf Drivers is to provide a clear migration path for existing drivers to a modern programming language. User-level code can be written in C initially and converted entirely to Java over time. Developers can also implement new user-level functionality in Java. We implemented Decaf Drivers in the Linux 2.6.18.1 kernel by extending the Microdrivers infrastructure. Microdrivers provided the mechanisms necessary to convert existing drivers into a user-mode and kernel-mode component. The resulting driver components were both written in C, consisted entirely of preprocessed code, and offered no path to evolve the driver over time. The contributions of our work are threefold. First, Decaf Drivers provide a mechanism for converting the user-mode component to microdrivers to a Java through cross-language marshaling of data structures. Second, Decaf supports incremental conversion of driver code from C to Java on a function-by-function basis, which allows a gradual migration away from C. Finally, the resulting driver code can be easily modified as the operating system and supported devices change, through both editing of driver code and modification of the interface between user and kernel driver portions.

Nooks

Nooks is a reliability subsystem that seeks to greatly enhance OS reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to existing driver and system code. To achieve this, Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to hasten automatic clean-up during recovery.

Shadow Drivers

We extended Nooks with shadow drivers to recover from driver failures. A shadow driver is a kernel agent that (1) conceals a driver failure from its clients, including the operating system and applications, and (2) transparently restores the driver back to a functioning state. In this way, applications and the operating system are unaware that the driver failed, and hence continue executing correctly themselves.

Publications

Presentations

  • Decaf: Moving Device Drivers to a Modern Language talk at USENIX, June 2009. (pdf )
  • The Design and Implementation of Microdrivers talk at ASPLOS, March 2008. (pdf )
  • Improving the Reliabibility of Commodity Operating Systems talk given at UIUC ACM Reflections/Projections Conference, October 2006. (pdf)
  • Improving the Reliabibility of Commodity Operating Systems job talk given at various places in 2005 (pdf)
  • Recovery Device Drivers talk at OSDI 2004, December 2004. (pdf)
  • Recovering Device Drivers, or Cleaning Up Nooks talk in UW class CSE551: Graduate Operating Systems (pdf)
  • Shadow Drivers: Transparent Recovery for Kernel Extensions poster at UW industrial Affiliates, February 2004 (pdf)
  • Improving the Reliability of Commodity Operating Systems talk at SOSP 2003, October 2003 (pdf)
  • Nooks poster at UW industrial Affiliates, February 2003 (pdf)
  • Nooks: an architecture for reliable device drivers talk at ACM SIGOPS worksop, September 2002(ppt)
  • Nooks: an architecture for reliable device drivers talk at UW Networking and Systems Retreat, June 2002 (ppt)