Index: sections/related.tex =================================================================== --- sections/related.tex (revision 434) +++ sections/related.tex (revision 435) @@ -47,7 +47,7 @@ code binaries to provide inline software guards and stack protection. In contrast, \toolshort\ operates on source code and allows drivers to operate on a copy of shared data. \toolshort\ marshals the minimum -required data and uses range hash to provide spatial safety. +required data and uses a range hash to provide spatial safety. \paragraph{Transactional kernels.} @@ -58,7 +58,7 @@ kernel changes. However, VINO applied to an entire extension and did not address recovering device state. In addition, it terminated faulty extensions, while most users want to continue using devices following -a failure. \toolshort\ is complementary to other transactional systems such +a failure. \toolshort\ is complementary to other transactional systems, such as TxLinux~\cite{rossbach:tx-linux:sosp:2007}, that provide transactional semantics for system calls. These techniques could be applied to driver calls into the kernel instead of using a kernel undo Index: sections/evaluation.tex =================================================================== --- sections/evaluation.tex (revision 434) +++ sections/evaluation.tex (revision 435) @@ -126,7 +126,7 @@ During each experiment, we run applications that use the driver to detect whether a driver failure causes the application to fail. For -network, we use \codesm{ssh}, and \codesm{netperf}, whereas for sound we use +network, we use \codesm{ssh}, and \codesm{netperf}, and for sound we use \codesm{aplay}, \codesm{arecord} from the ALSA suite. We tested the mouse by scrolling the device manually as we performed the fault injection experiments. After each injection experiment, we determine @@ -156,10 +156,10 @@ values. Finally, we verify that changes to drivers made using non-class -interfaces, such as the {\tt proc} and {\tt sys} file systems, -before any failures persist. In contrast, shadow drivers cannot -replay these actions since they cannot capture non-class -driver interactions. +interfaces, such as the {\tt proc} and {\tt sys} file systems, before +injecting failures and present following recovery. In contrast, shadow +drivers cannot replay these actions since they cannot capture +non-class driver interactions. @@ -365,7 +365,7 @@ reset the mouse. % measured using the high precision TSC processor register %%% -\subsection{Usefulness of being fine-grained} +\subsection{Utility of Fine Granularity} We evaluate whether selectively isolating specific entry points is useful by looking for evidence that driver bugs are confined to one or a few entry points. If the functions with bugs are reachable through Index: sections/design.tex =================================================================== --- sections/design.tex (revision 434) +++ sections/design.tex (revision 435) @@ -6,20 +6,20 @@ recovery. This system protects code from faults at the granularity of a single thread executing a single entry point. \toolshort\ recovers from any failures that occur during the function. This can greatly -reduce the cost of isolating and tolerating faults, because far less +reduce the cost of isolating and tolerating faults because far less code is affected. We list four goals of providing fine-grained fault tolerance: \begin{enumerate} -\item {\em Class Independent.} Isolation and recovery should be +\item {\em Class independent.} Isolation and recovery should be independent of the driver-kernel interface and should be able to recover driver actions from proprietary commands. % \item {\em Low infrastructure.} Little new code should be added to the kernel in support of \toolshort. % -\item {\em Pay-as-you-go.} \toolshort\ should not have a permanent +\item {\em Pay-as-you-go.} \toolshort\ should not have a fixed minimum overhead of isolation or monitoring driver behavior. Furthermore, programmer effort should only be required only when fault tolerance is desired. @@ -43,7 +43,7 @@ \subsection{Fault Model} A driver entry point is a driver function invoked by the kernel or applications to access -specific driver functionality. Each driver registers a set of functions as entry points +specific driver functionality. Each driver registers a set of functions as entry points, such as to initialize the device or transmit a packet. Driver entry points can be invoked by applications multiple times in arbitrary order. Hence, drivers should not make assumptions about the order or past history of these invocations. \toolshort\ provides fault tolerance @@ -51,8 +51,8 @@ the entire driver as a component with internal state. As the driver executes, the \toolshort\ isolation mechanism enforces fine-grained memory -safety. It ensures that the driver entry point is only allowed to access its stack and -data passed to the driver; access to anything else will be treated as a fault. +safety. It ensures that the driver entry point is only allowed to access +data passed to the driver and its stack; access to anything else will be treated as a fault. \toolshort\ detects faults in driver entry points in three ways. First, \toolshort\ detects memory failures (such as null pointer dereferences) and reading/writing unintended kernel and driver structures. Second, \toolshort\ uses marshaling to copy data in and out @@ -60,11 +60,11 @@ be detected, although errors with compatible types (such as treating an array of bytes as an array of longs), will not be. \toolshort\ on its own does not provide any semantic checks to enforce driver invariants. Hence, driver faults must be detected within the -entry point where they occur. Otherwise, failures that begin with one entry point -improperly setting a flag that is read by another cannot be tolerated. Third, \toolshort\ -also catches processor exceptions which includes NULL pointer exception, general +entry point where they occur. Otherwise, failures that that are triggered when one entry point +improperly sets a flag that another read and faults cannot be tolerated. Third, \toolshort\ +catches processor exceptions such as NULL pointer exception, general protection fault, alignment fault, divide error (divide by zero), missing segments, and -stack faults and triggers recovery if they arise out of isolated driver entry points. +stack faults. It triggers recovery if an exception arises within an isolated driver entry point. %\fixme{ %I gather that you are assuming every driver invocation is state-less, i.e., one invocation does not affect the next. However, consider the following rather-common scenario where you have a server and muliple worker threads have been spwaned to handle multiple client requests. @@ -86,27 +86,26 @@ \begin{enumerate} \item {\em Untested code}: Device drivers often contain untested code - such as chipset specific code or recovery code that can be invoked + such as chipset-specific code or recovery code that can be invoked safely using \toolshort. % \item {\em Statically found bugs}: Often static analysis tools identify hard to find/trigger driver bugs with substantial false positive rates. \toolshort\ can be integrated with existing static analysis tools until a fix is issued, which often - takes considerable time. This approach limits failures when such - code is triggered under buggy situations, while limiting the - overhead at other times. + takes considerable time. This approach limits the + overhead to just the buggy code, just when it contains known bugs. % \item {\em Runtime monitoring tools}: Runtime monitoring tools flag incoming requests based on their parameters, such as a specific {\tt - ioctl} command code, or enabled at run time through module - parameters using run-time monitoring~\cite{liblit:2005} or + ioctl} command code, or are enabled at run time through module + parameters~\cite{liblit:2005} or security tools~\cite{paxson:1998}. \toolshort\ can dynamically - decide whether to execute code in isolation or at full speeds. + decide whether to execute code in isolation or unsafely at full speed. \end{enumerate} -Furthermore, in our evaluation we analyze a list of bugs and +In our evaluation we analyze a list of bugs and find that they only affect 14\% of all driver entry points. Hence, limiting the cost of fault tolerance to affected entry points can be useful. We now describe the two major components of \toolshort: isolation and @@ -137,7 +136,7 @@ driver and kernel state when suspicious entry-points are invoked. \toolshort\ uses static analysis and code-generation to generate another kernel module that contains suspect entry points -instrumented for memory safety. Furthermore, \toolshort\ generates +instrumented for memory safety. \toolshort\ also generates communication code containing marshaling routines to copy driver and kernel state necessary for executing these entry points in isolation. Since the static analysis to marshal the data structures @@ -205,9 +204,9 @@ checkpoint/restore: power management. The functionality provided by power management, to suspend a device -before entering the low power mode and restoring it when transitioning -to high power mode, is similar to what is required to support device -checkpoints. We reuse the suspend/resume code by identifying code +before entering a low power mode and restoring it when transitioning +to high power mode is similar to what is required to support device +checkpoints. We reuse the suspend/resume code by separating code that supports saving state to memory from the code that actually suspends the device. Similarly, we identify code required for restoring this state. In Section~\ref{sec:device}, we describe in detail how power @@ -233,12 +232,12 @@ subsystem and writing and maintaining wrappers around the driver-kernel interface. -There is also no recovery overhead of monitoring the correctly +There is also no recovery overhead of monitoring correctly executing requests at {\em all} times since driver recovery is based on checkpoints. Finally, \toolshort\ provides fast recovery since it does not restart the driver and re-execute the complicated device probe -routines. Since the device state is restored from a checkpoint, the recovery -times are an order of magnitude shorter as we demonstrate in our +routines. The device state is restored from a checkpoint, so recovery +is an order of magnitude faster as we demonstrate in our evaluation in Section~\ref{sec:evaluation}. Index: sections/intro.tex =================================================================== --- sections/intro.tex (revision 434) +++ sections/intro.tex (revision 435) @@ -41,7 +41,7 @@ % \item {\em Not enough:} Shadow drivers must encode the semantics of the kernel/driver interface. However, many drivers have proprietary - commands that cannot be captured by a generic shadow driver, leading + commands that cannot be captured by a shadow driver common to an entire class, leading to incomplete recovery. Recent work showed that up to 44\% of drivers have non-class behavior~\cite{kadavasplos12}. % @@ -66,11 +66,11 @@ shortcomings called {\em Fine-grained Fault Tolerance} (\toolshort). Rather than isolating and recovering from the failure of an entire driver, \toolshort\ executes a driver {\em entry point} as a -transaction and uses software-fault isolation to prevent corruption -and detect failures. If the call faults, \toolshort\ rolls back driver -state and fails the call. On entry to a driver, a stub copies +transaction and uses software fault isolation to prevent corruption +and detect failures. On entry to a driver, a stub copies parameters to the driver code. Only if the driver executes correctly -are the results copied back; otherwise, the copy is destroyed. +are the results copied back. If the call faults, \toolshort\ destroys the copy to.roll back driver +state and fails the call. In order to restore device state modified by a driver before faulting, we developed a novel {\em device state checkpointing} mechanism that @@ -99,7 +99,7 @@ \begin{itemize} -\item We build fine-grained fault tolerance, a system consisting of a +\item We describwe Fine-Grained Fault Tolerance, a system consisting of a static analysis and code generation tool that provides isolation by executing each driver request on a minimal copy of required driver state. Our system can be used to isolate specific requests and we Index: sections/conclusion.tex =================================================================== --- sections/conclusion.tex (revision 434) +++ sections/conclusion.tex (revision 435) @@ -15,10 +15,7 @@ operating systems that should be explored. \section*{Acknowledgements} -TBD -%\noindent This work is supported in part by the National Science -%Foundation (NSF) grants CCF 0621487 and CNS 0745517, and by the -%Wisconsin Alumni Research Foundation. We would like to thank -%Liblit for helpful discussions during the initial stages of the -%project and our shepherd Miguel Castro for his useful advice. Swift -%has a financial interest in Microsoft Corp. +\noindent This work is supported by National Science Foundation (NSF) +grants CNS 0745517 and CNS 0915363 and by a gift from Google. We would +like to thank our shepherd Emmett Witchel for useful feedback. Swift +has a financial interest in Microsoft Corp. Index: sections/odft.tex =================================================================== --- sections/odft.tex (revision 434) +++ sections/odft.tex (revision 435) @@ -1,7 +1,7 @@ \section{Fine-Grained Isolation} \label{sec:odft} -Isolation ensures that the driver and kernel changes made by a request are not +Isolation ensures that the driver and kernel state changes made by a request are not propagated if the request fails. We need the following properties from an isolation mechanism: @@ -27,9 +27,9 @@ of the driver and kernel, which is a copy of data referenced from an entry point but not entire structures. For example, when a network driver issues an ioctl to update its transmit ring parameters, FGFT -uses points-to analysis and pre-determines the fields touched, +uses points-to analysis and pre-determines the fields an entry point can access, such as {\tt netdev$\rightarrow$priv$\rightarrow$tx\_ring} -and {\tt netdev$\rightarrow$priv$\rightarrow$rx\_ring} and will only +and {\tt netdev$\rightarrow$priv$\rightarrow$rx\_ring}, and will only generate marshaling code to copy in/out only these fields to reduce the generated code and the unnecessary copying of unused fields. If the entry point does not @@ -128,7 +128,7 @@ We do not add all local variables to the range table because we trust the compiler to generate correct code for moving variables between registers and the stack. However, if the driver ever takes the address -of a local variable, or it creates an array as a local variable, then +of a local variable, or creates an array as a local variable, then Isolator adds a call in the instrumented SFI driver to add the variable's address and length to the range table and remove it from the range table when the variable goes out of scope. Similarly, we @@ -140,7 +140,7 @@ \toolshort\ Isolator generates {\em stub code} to invoke suspect entry -points that copies into and out of the driver. Similar to RPC stubs, +points that copies data into and out of the driver. Similar to RPC stubs, these stubs create a copy of the parameters passed to the suspect code, but also copy any driver or kernel global variables it uses. When the suspect entry point completes, stub code copies @@ -152,7 +152,7 @@ Isolator automatically identifies the minimal data needed for an entry -point through static analysis. This includes the structure fields from +point through static analysis. This data includes the structure fields from parameters referenced by the entry point or functions it calls plus fields of global variables referenced. As they copy data, stubs update the range table with the address and length of each object. For @@ -160,7 +160,7 @@ stubs fill in the existing address of the field, its length, and whether the entry point needs read, write, or read/write access. -If suspect code callbacks invoke the kernel, Isolator generates stubs +If suspect code invokes the kernel, Isolator generates stubs for kernel functions that copy parameters to the kernel and copies kernel return values back to suspect code. The SFI driver may pass in fields from its parameters to the kernel as arguments. To avoid @@ -219,28 +219,30 @@ the rest of the kernel, the lock stubs defer releasing locks until after the entry point returns to the kernel. -The above mechanism protects shared structures across different driver threads. -However, the suspicious thread can also block waiting for data to arrive -on shared structures that have been copied over from other driver threads. -This re-synchronization across driver threads is uncommon and we measure using static -analysis the driver entry points, where the driver thread waits for another thread using the -Linux's {\tt completion} family of functions or by sleeping in a loop waiting for -kernel data to arrive. +The above mechanism protects shared structures across different driver +threads. However, the suspicious thread can also block waiting for +data to arrive on shared structures that have been copied over from +other driver threads. Fortunately, resynchronization across driver +threads is uncommon. Using static analysis, we measuredhow often one +driver thread waits for another thread using the Linux's {\tt + completion} family of functions or by polling in a loop waiting for +data to arrive. -Overall, we find driver resynchronization occurs in 2.7\% of drivers and 1.4\% of -all entry points. We now characterize the source of these re-synchronizations: Most -re-synchronizations occur during communication with the device; drivers wait for a device -operation to finish and a device callback sets the completion structure. In most cases, -only the completion structure responsible for device notifications needs to be annotated. -However, complex drivers that communicate with devices using a layered interface (such as -SCSI, WIFI) may wait for lower layers to communicate with device and also update the -appropriate drivers structures with the result of the operation. In such cases, -annotations required for completion structures and shared device structures for the driver -to work correctly. Finally, driver threads also sleep inside loops waiting for other threads to -finish by polling reference counts or driver structures before invoking device operations -such as disconnecting the device. If these threads modify -state across threads, then FGFT will not recover correctly for this fraction of -drivers/entrypoints. +Overall, we find driver resynchronization occurs in 2.7\% of drivers +and 1.4\% of all entry points. Most re-synchronizations occur during +communication with the device: drivers wait for a device operation to +finish and a device callback sets the completion structure. In most +cases, only the completion structure responsible for device +notifications needs to be annotated. However, complex drivers that +communicate with devices using a layered interface, such as SCSI or +WIFI, may wait for lower layers to communicate with device and also +update the appropriate drivers structures with the result of the +operation. In such cases, annotations are required for completion +structures and shared device structures for the driver to work +correctly. Finally, driver threads also sleep inside loops waiting for +other threads to finish by polling reference counts or driver +structures. If these threads modify state across threads, then FGFT +will not recover correctly for this fraction of drivers/entrypoints. %\fixme{add a paragraph in the paper describing details of FGFTâ€s handling of resynchronization: @@ -276,7 +278,7 @@ generate an exception. We also modified the Linux kernel exception handlers to detect unexpected traps from the SFI driver as failures. If one occurs, the trap handler sets the instruction pointer to the recovery routine. This is -the only change to the Linux kernel, and required only 38 lines of +the only change to the Linux kernel, and required 38 lines of code. The detection mechanisms may miss several categories of Index: sections/device.tex =================================================================== --- sections/device.tex (revision 434) +++ sections/device.tex (revision 435) @@ -225,7 +225,7 @@ efficient because only irretrievable state is saved. Unlike suspend-resume, it may be useful to use device state -checkpointing from interrupt contexts, where sleeping is not +checkpointing from interrupt contexts` where sleeping is not allowed. As a result, checkpoint and restore code must convert sleeps to busy waits (\codesm{udelay} in Linux) and use memory allocation flags safe for interrupt context (\codesm{GFP\_ATOMIC} in Linux). @@ -341,7 +341,7 @@ the driver fails. \paragraph{Recovery steps.} -In case a failure is detected by SFI or processor exceptions originating from +In case a failure is detected by SFI or processor exceptions originating from a suspect module, the recovery routine restores driver operation through a sequence of steps as shown in Figure \ref{fig:howitworks}: