Overview
Device drivers are a major source of complexity, unreliability, and
cost for modern operating systems. As evidence, drivers account for
the majority of system crashes: Microsoft reports that 89% of Windows
XP crashes are caused by device drivers, and Linux driver code had up
to seven times the bug density of other kernel code. The objective
of this research is to improve device drivers by (1) reducing the
complexity and cost of implementing device drivers, (2) improving the
fault tolerance of device drivers, and (3) improving the performance
of device drivers on modern hardware and software architectures.
Microdrivers and DriverSlicer
A major difficulty in writing kernel driver code is the many
unenforced rules required by the kernel. Some examples from Windows
include:
- Functions that block may not be called at high priority levels or
deadlock may occur.
- Locks provide mutual exclusion above a certain priority level but
not below. For example, KeAcquireSpinLockForDpc synchronizes all
callers except interrupt handlers.
- Code executing at high priority may not access pageable memory,
because page faults cannot be satisfied.
- Addresses passed from applications are only accessible when
executing on a thread from the application's process but not on kernel
worker threads, such as during a timer callback.
Coding at user level simplifies driver development because these rules
do not apply. In addition, there are many more software engineering
tools and programming languages available that further
aid programmers.
There have been attempts to execute driver code in user mode, as in a
microkernel. However, current user-mode techniques suffer one of
from two flaws. User-level driver frameworks that run unmodified
kernel drivers suffer from poor performance because the existing
kernel interface was written assuming fine-grained sharing, trusted
code, and zero-cost invocations. For example, the kernel calls
network drivers once for each packet, rather than batching a set of
packets into a single call. In contrast, user-mode driver frameworks
with good performance require rewriting drivers to a new interface to
avoid these inefficiencies. Furthermore, some user-level driver
systems limit support to devices that do not require DMA or interrupt
handling. Considering the large base of existing drivers,
these problems limit the usefulness of current user-mode driver
frameworks.
We propose a novel hybrid approach to building drivers that provides
both high performance and compatibility. Rather than execute all
driver code at user level, we propose to extract a kernel-level
microdriver from existing driver code. The microdriver contains only
the code required for high-performance and to satisfy OS
requirements. We convert the remaining code to a userdriver that
executes in a user-level process. Shared data is marshaled and copied
between the two portions on function calls. To maintain compatibility
with existing code, we will create DriverSlicer, a tool to
semi-automatically partition drivers.
This architecture resembles network routers, in which
dedicated processors perform high-speed switching while complicated
routing and error handling are left to separate control processors. We
leave the code for transmitting data to and from a device in the
microdriver, which ensures high performance. We move the code for
initializing and configuring the device, error handling, and reporting
statistics out of the kernel and into user mode.
Nooks
Nooks is a reliability subsystem that seeks to greatly enhance OS
reliability by isolating the OS from driver failures. The Nooks
approach is practical: rather than guaranteeing complete fault
tolerance through a new (and incompatible) OS or driver architecture,
our goal is to prevent the vast majority of driver-caused crashes with
little or no change to existing driver and system code. To achieve
this, Nooks isolates drivers within lightweight protection domains
inside the kernel address space, where hardware and software prevent
them from corrupting the kernel. Nooks also tracks a driver's use of
kernel resources to hasten automatic clean-up during recovery.
Shadow Drivers
We extended Nooks with shadow drivers to recover from driver
failures. A shadow driver is a kernel agent that (1) conceals a driver
failure from its clients, including the operating system and
applications, and (2) transparently restores the driver back to a
functioning state. In this way, applications and the operating system
are unaware that the driver failed, and hence continue executing
correctly themselves.
Publications
Microdrivers
- Vinod Ganapathy, Matthew Renzelmann, Arini Balakrishnan, Michael
Swift and Somesh Jha.
The Design and Implementation of Microdrivers, to appear in
Proceedings of the 13th International Conference on Architectural
Support for Programming Languages and Operating Systems, Seattle,
WA, March 2008.
- Vinod Ganapathy, Arini Balakrishnan, Michael M. Swift, and Somesh
Jha.
Microdrivers: A New Architecture for Device Drivers, in
Proceedings of the 11th Workshop on Hot Topics in Operating
Systems San Diego, California, May 2007.
Shadow Drivers
- Michael M. Swift, Damien Martin-Guillerez, Muthukaruppan
Annamalai, Brian N. Bershad and Henry M. Levy. Live Update for Device Drivers,
Univ. of Wisconsin Computer Sciences Technical Report CS-TR-2008-1634,
Mar. 2008.
- Michael Swift, Muthukaruppan Annamalai, Brian N. Bershad, Henry M.
Levy. Recovering Device Drivers, in ACM
Transactions on Computer Systems, 24(4), Nov. 2006.
- Michael Swift, Muthukaruppan Annamalai, Brian N. Bershad, Henry M.
Levy. Recovering Device Drivers,
in Proceedings of the 6th ACM/USENIX
Symposium on Operating Systems Design and Implementation, San
Francisco, CA, Dec. 2004.
Nooks
- Michael Swift. Improving
the Reliability of Commodity Operating Systems, Ph.D. Dissertation, Oct. 2005.
- Michael Swift, Brian N. Bershad, and Henry M. Levy. Improving the Reliability of Commodity
Operating Systems, in ACM Transactions on Computer
Systems, 23(1), Feb. 2005.
- Michael Swift, Brian N. Bershad, and Henry M. Levy. Improving the Reliability of Commodity
Operating Systems, in Proceedings of the 19th ACM Symposium
on Operating Systems Principles, Bolton Landing, NY,
Oct. 2003. Best paper award.
- Michael Swift, Steven Martin, Henry M. Levy, and Susan J.
Eggers. Nooks:
an architecture for reliable
device driversin Proceedings
of the Tenth ACM SIGOPS European Workshop, Saint-Emilion, France,
Sept. 2002.
Presentations
- Improving the Reliabibility of Commodity Operating
Systems talk given at UIUC ACM Reflections/Projections
Conference, October 2006. (pdf)
- Improving the Reliabibility of Commodity Operating
Systems job talk given at various places in 2005 (pdf)
- Recovery Device Drivers talk at OSDI 2004, December
2004. (pdf)
- Recovering Device
Drivers, or
Cleaning Up Nooks talk in UW class CSE551: Graduate Operating
Systems (pdf)
- Shadow Drivers:
Transparent
Recovery for Kernel Extensions poster at UW
industrial Affiliates, February 2004 (pdf)
- Improving the Reliability
of
Commodity Operating Systems talk at SOSP 2003, October 2003 (pdf)
- Nooks poster at UW
industrial Affiliates, February 2003 (pdf)
- Nooks: an architecture
for
reliable device drivers talk at ACM SIGOPS worksop, September
2002(ppt)
- Nooks: an architecture for reliable device
drivers talk at UW Networking and Systems Retreat, June
2002 (ppt)