We first mention the common concerns and then address specific review specific questions for reviewers. ------------------------------------------------------------------------------------------ We thank the reviewers for their detailed feedback, and take encouragement from their words: "The idea of slightly tweaking power management code in drivers to save and restore device is so clever that we should accept the paper for that alone." (Review 1), "The paper presents a fresh new-look to driver and device recovery" (Review 4), and "The authors of this paper have done an impressive amount of engineering" (Review 3). Our primary contribution is two fold: first, we show how to re-use existing power management code to provide device checkpoints, and have designed a fault tolerance system based on it (as identified by Reviews 1,2,4,5). Second, we demonstrate a transactional isolation model than can improve performance by selectively isolating specific functions. Review 1 is not clear about why the cost of checkpoints is so low when compared to device restart. We briefly mention this in Section 4.2: device initialization performs a full probe sequence to determine the type, capabilities and environment for the device, while FGFT merely reloads an existing configuration. We will describe this in detail in subsequent revisions of the paper. Review 1,2 and 3 ask what types of faults we handle and question if they are more limited than related work (Nooks). FGFT traps on all processor exceptions (NULL pointer exception, general protection fault, alignment fault, divide error (divide by zero), missing segments, and stack faults) apart from memory errors. It also detects malformed data structures during marshaling. Compared to Nooks, it may not detect corruption that occurs in one call and is accessed in another, although the marshaling may detect such corruption. In addition, due to automated generation of marshaling code, FGFT does not include the explicit parameter checks. Recent work on statically determining kernel entrypoint pre-and post-conditions [1] could be used to detect more faults automatically. Our fault injection tests (5.1) used different bug types (Table 2), which manifest as memory violations or as one of the above processor exceptions. Review 3: "This reviewer believes the paper should be rejected because it is long on engineering and short on science." We politely disagree. Our novel contribution is device checkpointing, which can be used for variety of uses apart from fault tolerance (Table 1), and the idea of transactional isolation. Within fault tolerance, the availability of checkpointing introduces a fundamental different way to think about driver isolation. Furthermore, to clearly demonstrate its value and overheads, we implemented a driver isolation and a driver recovery solution, which has made the paper heavy on implementation and engineering. In subsequent revisions of the paper, we will describe the our research contributions of device checkpoints and in-kernel SFI using marshaling better. However, we also believe that rigorous engineering is one of our important contributions. Reviews 3,4 ask whether device checkpointing can work if there are bugs in power management code. The answer is no. However, FGFT represents a new place in the tradeoff between complexity of implementation and fault tolerance: Nooks and other full isolation systems require much more code hand-written code in the OS, but isolate the entire driver. Microdrivers isolate only non-critical path code. FGFT isolates all code except for checkpoint/restore, which is typically less than 5% of the driver code. Review 1,3,4,5 discuss selective isolation and where it is useful. FGFT is most useful if it can be determined that specific entrypoints are likely to contain faults, such as if they had recent patches or were flagged by static bug-finding tools. Past work on Microdrivers showed that bugs do not have a higher density in the I/O path code and that such code is a small fraction of total driver code. In subsequent revisions of the paper, we will demonstrate this with an example (Review 1). Reviews 2,3,4 discuss our synchronization policy. FGFT uses lazy version management and holds locks until the entry point completes successfully. This isolates threads from each other, and ensures that conflicts between concurrent threads are impossible. It is possible that some locking patterns could lead to deadlocks, but we have not seen those patterns in any of the drivers we have examined. Furthermore, deadlock could be detected at lock acquire time and handled by aborting one of the threads involved. Review 2 compares us with past transactional systems TxOS (SOSP 09), TxLinux (SOSP 07) and xCalls (Eurosys 09). These systems do not perform device I/O transactionally and either rely on higher-level atomicity techniques (TxOS and xCalls) or serialize transactions with a lock (TxLinux) We believe device checkpointing is an useful contribution and would be excited to see it is applied to other applications. [1] Diagnosys: Automatic generation of a debugging interface to the Linux kernel. In ASE 2012. ------------------------------------------------------------------------------------------ We now discuss important concerns raised by reviewers not discussed above. Questions are give in quotation marks, and answers are marked with an arrow (==>): Review 1: ========= The questions in this review were discussed above. We will apply the helpful suggestions mentioned in this review. Review 2: ========= "Also, there appear to be limitations in supporting disks that the authors gloss over (with a reference to Membrane [39].", "Where is the USB mass storage device? Is its absence due to the problem of persistent storage(4.1.4)? Or performance overhead of copyin/out (sec 3)?" ==> Drivers managing persistent state will not recover that state via FGFT. For example, a faulty disk driver could write to the wrong block, and neither FGFT nor Membrane would solve that problem. For failures that do not write to the wrong block, FGFT provides at-most-once failure semantics. The overhead of FGFT with storage devices is likely to be lower than with network devices because the data itself need not be copied; only the descriptors pointing to the data. In addition, storage devices typically generate fewer requests per second: the network driver described handled 70,000 packets/second, each of which took a separate driver invocation. "Finally, it is a bit suspicious that the authors used a 3 year old kernel (2.6.29 was released 3/09) for their evaluation. " ==> We used an older kernel only because we already using it for other projects; our tools have no dependency on a particular kernel version. "I am most concerned with how FGFT must take exclusive access to the device to take a checkpoint....The actual locking that must take place, especially during device callbacks is ad-hoc often difficult to determine. " "Can you please analyze the drivers (even classes you didn't evaluate) to convince me that this isn't a hopeless task for entire classes of drivers?"" "You have just modified the kernel locking convention, and how to you guarantee you won't deadlock? " "I also don't understand how copyin/out can work if there are multiple threads ever let into the driver, even after configuration. " ==> In general, FGFT does not let other driver threads execute while one thread is executing in isolation. Hence, there can be no conflicts between concurrent threads. It does this by re-using existing locks present in the driver and expanding their scope for isolated calls to cover the entire call. In the next version of the paper we will add an analysis of more drivers to demonstrate that resynchronization is not a widespread problem. For code that could deadlock if it holds locks across kernel call, it may not be possible to use FGFT; however it can still be applied to other entry points in the driver. "Comparisons with Mondrix (SOSP '05)" ==> Mondrix offers cheap memory protection using specialized hardware. But it does not create a copy of data being accessed and cannot provide rollback. We will compare our SFI to systems like Mondrix in the paper. Review 3: ========= "Assumptions in the paper/system that are not addressed or validated in the paper (in priority order): 1) Memory safety violations are the primary cause of driver failures. This neglects other causes of driver failure including race conditions, lock inversions, state machine errors, errors in logic, etc. Key unanswered question: what fraction of driver failures are cause by memory safety violations?" ==> Unfortunately there is not data available on the causes of driver crashes; from preliminary analysis of available kernel crash dumps, almost all driver-related crash failures are due to memory safety problems. FGFT can address lock inversions via detecting deadlock and aborting a call, but cannot automatically address state machine errors or logic errors. Then again, no other automatic fault-tolerance system can handle these. "2) SFI isolation can automatically separate locking and ordering operations for memory accesses. Key unanswered question: how does isolator identify and reactor locking operations." ==> We use static analyses to detect locking operations in suspect code based on common Linux Kernel lock functions. "is refactoring power management code into checkpoint and restore code automatically possible? if it can't be done automatically, how much domain and device expertise is required to do it manually?" ==> We show in the evaluation how few changes were required to refactor driver code, and we have no special knowledge of these devices. While automating the conversion for all drivers would likely be difficult or impossible, it could be done for drivers with simple suspend/resume routines (which are the majority). We will look at analyzing more drivers to evaluate this question. "if the system is so easy to apply, why wasn't it applied to 60 or 600 drivers instead of just 6? what about more complex drivers like queuing storage drivers or graphics drivers?" ==> Evaluating a driver requires having the device present, and it would be expensive to purchase 60 or 600 devices. We evaluate with a comparable number of drivers to past work on driver fault tolerance, and we augment that with statistics about all drivers. We agree that the technique may not apply to all drivers, particularly complex ones such as graphics. However, that does not reduce the value it offers to all other drivers. No prior driver fault tolerance paper has been able to address graphics drivers. "Does refactoring of power management code for checkpointing violate any ordering assumptions in the code?" ==> Existing suspend/resume code does have assumptions that we address by acquiring driver locks before suspend. In Section 4.1.3 we discuss changes needed to perform checkpoints in interrupt or atomic contexts. Review 4: ========= "From my understanding of paragraph 4 in section 2.1, I gather that you are assuming every driver invocation is state-less, i.e., one invocation does not affect the next. " ==> We will clarify the text, as that is not our assumption: we acquire locks during isolated driver calls so that the state from one call is available to the next call. "While the driver-state touched is explicitly annotated by the user, it is unclear how the kernel-state touched is identified." ==> We currently do this manually. However, Isolator can identify statically which fields have been modified by the driver prior to a kernel call, and which fields are accessed afterwards. These are the fields that must be synchronized with the kernel. "Can the authors provide an example of the kinds of structures that were touched and which fields were copied, as opposed to the entire structure?" ==> We will add an example to the next version of the paper. For example, if the driver issues an ioctl that updates driver internal private structure (usually pointed to by struct netdev->priv, where netdev is kernel's netdevice). In such cases, FGFT will use points-to-analysis and pre-determine the fields touched, such as netdev->priv->tx_ring and netdev->priv->rx_ring and only generate marshaling code to copy in/out these parameters (rather than complete netdev or netdev->priv). This reduces marhsaling code and unnecessary copying. " How were the time-related measurements in section 5.3 done? What is the error margin of the measurement?" ==> We used the TSC processor register to get the timestamp values, (rdtscll calls) which is used for extremely high precision for short intervals. We did an average of 5 runs. Review 5 ======== "cost of protection: 20+ us. In other words, the approach adds 60,000 cycles to each driver entry point that needs to be protected." ==> This approach explicitly provides a tradeoff between higher latency per-use costs but reduced use by only isolating select entry points. If the majority of entry points require isolation then whole-driver techniques are more useful. However, if it is possible to identify one or two suspect entry points, then FGFT can have much lower cost. "knowing what to protect (the method is probably too expensive to protect everything)." ==> In the paper we suggest using static bug-finding tools and applying isolation to recently patched code. In addition, crash dump stack traces could be used to identify candidates for isolation. "which class of bugs does it actually help against (e.g., would a restarted driver after checkpoint resumption just fail again the same way, in the common case?)." ==> The system can only recover in the presence of heisenbugs if bugs lie on common case, but can prevent crashes for other bugs by failing the call instead of re-invoking it. This allows higher-level recovery techniques, such as unloading and reloading the driver, to be applied that are more likely to resolve persistent faults. However, the system can be used to not let buggy code on uncommon path, affect the common case. "do the performance data presented account for the possibility that the method forces more serialization than would otherwise be needed for the driver (e.g., multiqueue nics, etc)." ==> Yes, FGFT imposes more serialization while invoking isolated calls. However, if FGFT is applied to non-critical path code, this is unlikely to reduce performance. "what assumptions are made about drivers for stateful devices like disks where a checkpoint can't include the data being written to disk?" We assume the driver will not write to the wrong block on disk and that it either writes the correct data or no data at all. This is similar to past work on file-system recovery such as Membrane.