=========================================================================== OSDI '12 Review #4A Updated Saturday 12 May 2012 11:34:19pm PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 2. Weak reject Reviewer expertise: 3. Knowledgeable ===== Paper summary ===== Scaffolding for doing deeper or broader testing of device drivers using symbolic execution. ===== Paper strengths ===== A tool worth having. Real bugs (some undiscovered before) found. Well written paper. ===== Paper weaknesses ===== The depth of the research contribution isn't evident. Although useful, the tool appears like an engineering exercise to extend prior software with a better user interface (driver source code rather than binary), a better debugging interface (checkers written in C running in the kernel context), and generic symbolic execution speedups (new path selection heuristics). ===== Comments for author ===== Although this is a very well written paper, it left me unsatisfied in the end. It's clear that SymDrive does something. But the magnitude of this something is hard to evaluate. For one, as the authors honestly admit, the tool is neither sound nor complete. It takes liberties that prune usable paths (e.g., by concretizing when executing through the kernel), it deprioritizes paths that may hide unwanted behaviors (e.g., in long loops), etc. But in the process, it does find some unwanted behaviors, as the bugs found demonstrate. So all in all, the tool (and paper) don't close the book shut on device driver testing (admittedly a hard job), nor does it open the book (as shown by the related work). Does it close a chapter on the book? Is there a class of device driver bugs or testing that it does do completely? That doesn't appear to be the case. And without such a meaningful definition of accomplishment, the paper doesn't seem to make quantifiable progress. To be somewhat ungenerous and glib, we can keep writing papers like this forever, and we still won't have solved the problem of establishing the correctness of a device driver. I would recommend that the authors focus on some aspect of testing for which they can show "complete" progress. It might be that with mild "refactoring" the paper can argue that all driver behavior induced by an arbitrary device is tested against the developer-written checkers. That would be great, because then we can focus on driver-kernel interactions and we'll know how to handle driver-device interactions. Or perhaps the focus could be on a single kernel transaction? There are several choices, but I would feel much more comfortable with the paper if it made a statement of contribution that establishes at least one problem solved. Some smaller issues: - Several of the contributions of this paper are rather generic: path prioritization policies for exiting loops or for staying within a function designated to be important, source-to-source rewriters to instrument module-boundary crossing, execution-trace comparison, etc. Those sound neat, and seem to help with the task at hand, but are they all that challenging? If they are, I'd love to know what major challenges the authors had to face. - I'd like to see an explanation why discarding all device writes is not making testing trivial. It's not obvious to me, not being an expert. - Is execution tracing in section 3.4 significant? If it is, I'd like to see a more detailed description. What exactly is compared across executions on the same OS, or across OSes? Is there canonicalization required? If not, what records are comparable? - There are various assertions throughout the paper about how X is difficult with technique A (e.g., static analysis) while it is easier with SymDrive. Although I believe some of these assertions, I'd like to see either explanations or citations for them. For example (at the bottom left of page 9), why is the fact that 41% of bugs occurred on a unique path through a driver make it difficult to find those bugs with static analysis? Editorial - Please give a brief definition of "collateral evolution" in this paper. - Page 5, top left paragraph. If writes have no effect, how are writes seen by subsequent reads? This is confusing. - Page 6. a user-mode tests -> a user-mode test - Page 7. "SymGen also instruments ... with opcodes"? They look like functions to me in Figure 2. You could have said "testing functions compiled to unique S2E opcodes". - Page 8. can it be it used -> can it be used. Figure 2 -> Table 2. - Page 9. What are the asterisks in the LoC column? The text only explains driver-name asterisks. - Page 10. this results shows -> this result shows. Also, why is parallel time higher than serial time for 8139too in Table 5? A typo, hopefully. =========================================================================== OSDI '12 Review #4B Updated Tuesday 10 Jul 2012 9:41:03am PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 3. Weak accept Reviewer expertise: 2. Some familiarity ===== Paper summary ===== This paper describes a driver testing tool that uses symbolic execution to enable testing drivers without a physical device. The contributions of the paper are innovations that make symbolic execution more effective in the specialized context of testing device drivers. The tool relies on a static analysis and source-source translation to minimize instrumentation effort, development effort in instrumenting driver code for testing, enables kernel-mode execution of checker routines, and provides an execution tracing system that can help identify changes to I/O behavior across multiple implementations. ===== Paper strengths ===== Despite considerable attention from the research community, improving driver reliability remains an important problem, and symbolic execution's conceptual promise of exploring all possible inputs is appealing in this context. This paper takes on several problems that limit the applicability of symbolic execution to drivers: namely, path explosion, the need for additional software artifacts like device and OS models, and lack of a specification of the correct behavior of a driver. The evaluation makes a compelling case that the authors's efforts to address these barriers make symbolic execution a practical tool for device drivers. Anecdotal evidence about developer effort and execution trace comparisons are compelling, and many of the bugs the tool found were significant enough to be fixed in subsequent kernel releases. ===== Paper weaknesses ===== While the system the paper describes appears to be effective, the paper relies on a lot of previous work, and the innovations required to make that work applicable in this environment are not earth-shattering. For example, the system provides static analysis and translation that identifies and instruments entry points into driver code. Such an automation certainly reduces developer effort, but given that drivers typically must implement a known interface to the kernel, finding and instrumenting those entry points is perhaps not a contribution-worthy problem. Enabling kernel-mode checkers (when user-mode checkers are available) to address the lack of correct specifications, and policy tweaks that improve coverage while limiting path explosion in the symbolic execution have a similar flavor. ===== Comments for author ===== * Re: interrupts/DMA: You invoke interrupt handlers on every transition from the driver into the kernel. Do you do this because it's the only way to drive the system to model interrupts raised by the device? It seems reasonable to be skeptical that this can capture the interrupt behavior of all devices. It would help the reader if you either articulate the argument why this models most devices well, or why the loss of fidelity in the model is tolerable if doesn't model most devices well. * Re: loop-elision: you claim that loop-elision prioritizes paths that exit loops early ameliorating the need for an EdgeKiller plugin. However, when SymDrive finds a loop that cant be terminated, the programmer is expected to #ifdef that code out, which strikes me as being a level of effort that is similar to that required to provide an offset for an EdgeKiller plugin. Are you arguing that the number of cases requiring such ifdefs is significantly fewer than those requiring EdgeKiller interventions? Or that ifdef is easier than finding the code offset? * Re: the list of exceptions SymGen keeps to deal with kernel functions that return non-standard values: it would be great to know (at least approximately) how many such exceptions were required, and assuming that number is small, would help buttress your arguments that inferring success through return codes without knowledge of the API returning the code is a reasonably general strategy. * Being able to test all ioctl handling code in one ioctl invocation is pretty cool. * Your anecdote about finding a bug in 8139too is interesting, and very cool, but it also makes me wonder how much effort was required to find that bug that way. The idea of finding hardware-specific bugs by comparing traces seems error prone: diffs in the traces don't necessarily indicate a bug, even if they indicate different I/O behavior. How much time/effort was invested in inspecting diffs in the traces? How did you determine that 8139too's differing behavior was in fact, a bug? Did you compare traces for different implementations of other drivers, or did you do this only for 830too, and you were lucky to choose one that happened to have a bug? * The result of the comparison to S2E is important. It might be worth your while to provide the reader with an earlier taste of just how insufficient S2E is for this job (without the modifications you provide). =========================================================================== OSDI '12 Review #4C Updated Friday 1 Jun 2012 3:14:45pm PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 4. Accept Reviewer expertise: 2. Some familiarity ===== Paper summary ===== Testing drivers is hard, especially if you don't have access to the hardware, or you can't get it to exhibit all of its behaviors, especially the unusual error conditions. SymDrive is a framework for using symbolic execution to allow testing of drivers without actual devices. It builds on previous work (mostly S2E) that provides a symbolic execution platform, but SymDrive specialized and extends SE2 so as to make it much easier and faster to test kernel drivers. (SymDrive exploits specific features of the driver domain; it's not clear if any of the improvements to SE2 generalize beyond that.) SymDrive works with both Linux and FreeBSD, and has found bugs in a bunch of drivers. ===== Paper strengths ===== It's really useful to have a way to test drivers without having to find ways to get the devices to display all of their possible behaviors. SymDrive seems to be a useful extension to S2E. And this extension is highly specific (perhaps too specific) to OS code, so it's on-topic for OSDI. The paper is pretty easy to read. ===== Paper weaknesses ===== In a few places, I found it hard to understand exactly what was happening. It's not clear whether any of the improvements for S2E can be generalized beyond drivers. ===== Comments for author ===== page 5, loop elision, especially the second paragraph -- I had a hard time understanding how well this works. For example, if you have to run through an entire checksum loop to see if it gets the right value, does prioritizing loop-exit paths work? And if you just #ifdef out this entire loop, doesn't that undermine the purpose of improved test coverage? I would like to think that SymDrive could be used to test the code that executes on checksum failure. fig 2: it's kind of hard for a reader to tell which lines were added and modified by SymDrive. I had to go find 8319cp.c to figure this out. I think it would be worth the extra space to include the original code, which is pretty short: while (work--) { if (!(cpr8(Cmd) & CmdReset)) return; schedule_timeout_uninterruptible(10); } maybe you can include the original code as a pair of comments? 5.1: "SymDrive does not presently work ... with multicore support." Does that mean that this is just a small matter of programming, or is it fundamentally difficult to support (and check) parallelism? 5.1, step 1: I think what you mean is that you replace the original driver.o with driver_as_modified_by_sym_drive.o, right? 5.2: you found 39 bugs in 26 drivers. Have you looked at bug repositories to figure out whether you found 39 of 39 bugs? or 39 of 1000? In other words, given all of the effort that you put into SymDrive, is it finding enough bugs to be worth it? page 11, execution tracing: this sort of answered the question I had when I read section 3.4: "how well does this actually work?" but not entirely. Earlier you quantifed how long it took to apply SymDrive to the phantom driver, but here it's not clear whether the trace-comparison process took a few minutes or a few days, and how tedious it was to actually find the bugs in the 8139too and rl drivers. section 6, DDT and S2E: can you apply your improvements to S2E to non-driver code? Perhaps some other domain, such as network services, where it's hard to get the other end to exhibit error states? Or is driver code really the only domain where your work applies? =========================================================================== OSDI '12 Review #4D Updated Monday 18 Jun 2012 5:59:04pm PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 4. Accept Reviewer expertise: 3. Knowledgeable ===== Paper summary ===== SymDrive specializes the idea of symbolic-execution-driven testing to the special case of testing device drivers. ===== Paper strengths ===== It's a clever fit: it obviates the need for a device to be present, it uses the kernel as its own model, and loops (which choke SymDrive) are rare. The evaluation is successful, finding a handful of new bugs, and discovering 17 known, verified bugs. ===== Paper weaknesses ===== I wonder if the inability to handle loops will put an important class of failures (around actual data handling) out of reach. ===== Comments for author ===== p1 The abstract mentions 39 bugs, but not all are verified. I suggest instead trumpeting in particular the 5 new bugs, those the devs agree are real and were not previously known. p1 That "two dozen" drivers mention "compile tested only" isn't very compelling; there must be thousands and thousands of driver checkins in the denominator. The real question (and hence motivation) is how many patches don't get committed -- or even written -- because of the inability to test beyond compile. p3 "The design applies to any OS supported by S2E" -- how broad is that set? p4 Tell us about the 1,954 "Changes to S2E". What kinds of changes are required? p4 What is the "Platform" bus? How much work is involved in adding support for a new bus? How hard would USB be? p5 "the test framework may inject additional symbolic data to increase coverage" -- please provide a clearer preview and maybe even a forward ref to a section number. p5, end of "favor-success sched" section: am I correct in inferring that forking cannot cross beyond one function call into the driver? (perhaps say that explicitly.) p5, loop elision: why can't you prioritize such that you prune paths that *don't* loop? That would seem to direct the system towards discovering bugs along paths that require loops. p6 "nonstandard values ... SymGen has a list of exceptions" -- please provide an example or two to clarify. p8 An inline or footnote definition of collateral evolution will help some readers (like this one, who had to look it up :v) p8 "we have not experienced any [false positives]" -- Can you contrive a case that would at least illustrate where they might come from? p9 "we tested phantom ... in 1h:45m" -- I see from tbl 2 that phantom is one of the smallest drivers, and it required zero annotations, so it seems pretty likely that that this is an ideal case, not a representative case -- call that out. p10 "the only changes were to update a few checkers" -- how many LoC was that, so I can fit it into the context of tbl 1? p10 "econet requires additional software" -- do you mean that it comes with some userland software that can push the driver into states that SymDrive can't find on its own? Or something else? Please clarify. p10 some "required kernel-mode tests" -- please give an example to clarify. p10 "this results" -- extra s p10 "this result shows that SymDrive can provide coverage similar to or better than that of running the driver on real hardware" -- this sentence is a little too generous to SymDrive; the effort to generate full coverage in this case wasn't very enthusiastic compared with a driver dev trying to push up gcov's numbers. p12 "SymDrive automatically detects and annotates loops" -- and punts them out of the evaluation, of course. Review by Jon Howell , intentionally unblinded. =========================================================================== OSDI '12 Review #4E Updated Saturday 30 Jun 2012 3:41:37pm PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 3. Weak accept Reviewer expertise: 4. Expert ===== Paper summary ===== The paper describes SymDrive, a symbolic execution framework for testing device drivers. SymDrive uses static analysis to identify and instrument (1) driver-kernel interface and (2) loops in driver code. This instrumentation is later used by symbolic execution to guide the search of paths, run checkers, and mark additional data as symbolic. Two search heuristics are described, one favors successful paths to execute driver code past initialization, and another favors loop exit paths to terminate loops early. The checkers are written in C code and runs within the checked code, instead of the symbolic execution engine. SymDrive SymDrive also provides a tool to log hardware I/O traces and compare the traces. symbolic execution uses KLEE. SymDrive's static analysis uses CIL. Evaluation on 26 drivers yielded 39 bugs and good statement coverage. ===== Paper strengths ===== The paper is well written, providing a detailed and honest account of what was done in the design, implementation, and evaluation of SymDrive, likely sufficient for someone interested to reproduce the SymDrive system. The paper also gives fair credits to prior work. SymDrive is a well engineered system. Its error detection and statement coverage results are also good. ===== Paper weaknesses ===== Although SymDrive leverages a few good existing ideas, I find it a bit thin on key new ideas. For instance, SymDrive's static analysis to discover driver-kernel interface and loops is relatively straightforward. Most rules SymDrive checks are already checked by prior work. ===== Comments for author ===== Once source code or a high-level intermediate representation of the code is available, automatically discovering loops seems easy: simply find cycles in the control flow graph or, even simpler, iterate through llvm::Loop structures. Discovering kernel-driver interface also appears easy. I'd like to see the hard problems solved by your static analysis to better understand the challenges. How hard is it to come up with the rules you check? Can any of the rules be automatically inferred from the paths you explore? For instance, if 99 out of 100 paths call foo()->bar(), and the other path doesn't, then this path may contain a bug. If rules can be automatically inferred, users of SymDrive don't have to have domain knowledge or write the checkers. There seem to be some interesting research challenges here. I'm curious how much the QEMU + KLEE approach buys you. If you start off with compiling the driver and kernel code to LLVM, and then simply use KLEE, would it give you more bugs and coverage etc? Going from source code to x86 assembly back to LLVM bitcode seems rather convoluted, and potentially making constraints unnecessarily complex. Sec 4.1 says you don't check for overflows. KLEE can check overflows by solving constraints to make a symbolic expression overflow, except it may find too many benign overflows. Is this the reason you didn't check for overflows? Sec 5.3 What are the annotations you added? If the number is small, you may describe all the annotations, possibly in a table. =========================================================================== OSDI '12 Review #4F Updated Friday 6 Jul 2012 11:22:50pm PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 3. Weak accept Reviewer expertise: 2. Some familiarity ===== Paper summary ===== This paper presents SymDrive, a system for testing Linux and FreeBSD drivers using symbolic execution. In addition, SymDrive uses static analysis to reduce the testing effort, enables the developer to write checkers in C, and provides an execution-tracing tool to efficiently test patches. By using SymDrive, the authors found 39 bugs across 21 Linux drivers and 5 FreeBSD drivers. ===== Paper strengths ===== Despite recent work, debugging drivers remains a significant problem. This paper advances the state of art in addressing this problem. The solution is very well crafted, and the evaluation is comprehensive. ===== Paper weaknesses ===== While effective, the solution seems a rather incremental improvement over previous work, in particular, over S2E. ===== Comments for author ===== I enjoyed reading this paper. The paper is well written, the solution is well thought out, and the evaluation is impressive. All in all, a very well executed paper! I have only a few comments. First, while I understand the difficulty of the problem and the fact that this paper is advancing the state of art, I still feel a bit disappointed about the limitations of the solution. In particular, SymDrive is neither provable sound nor complete. SymDrive does not detect bugs due to race conditions (between threads), which are ones of the hardest and time consuming bugs. Second, SymDrive is leveraging a collection of existing techniques, rather than proposing a new technique or idea. In fact, one could argue that the improvements over S2E are rather incremental. Third, sometimes the paper overwhelms the reader with details, at the expense of providing a higher level picture, and emphasizing the main ideas and tradeoffs. For example, Section 3 is a long list of low level technical details which I found a bit hard to follow. Also, it would have been nice to evaluate the impact of the various technique you use, i.e., static analysis, ability to write checkers in C, and execution tracing tools. Finally, it would have been nice to evaluate the impact of various prioritization techniques for path exploration. Other comments & questions: - sec 5.1: "If SymDrive reports warnings about too many paths, we annotate the driver code and repeat the operations." - How often did you have to do such annotations? - sec 5.5: I like the patch application. You should emphasize it earlier in the paper. - sec 5.6: "We ran it until it started thrashing the page file, with a commensurate drop in CPU utilization, for a total of 23 minutes." - Would increasing the memory size have made a significant difference? =========================================================================== OSDI '12 Review #4G Updated Wednesday 11 Jul 2012 2:45:11am PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 4. Accept Reviewer expertise: 3. Knowledgeable ===== Paper summary ===== Applying symbolic execution to debugging device drivers, without having a device. Naturally, if you symbolically execute the driver with all possible results from device registers, DMA operations, and interrupts, there's no need for real hardware - and you'll get more code coverage. ===== Paper strengths ===== Very nice insight (no hardware!), and something I for one might find extremely useful. A pretty significant chunk of tool code as well, even taking into account the previous work (S2E) which it relies on. ===== Paper weaknesses ===== Yet another paper on applying symbolic execution to operating systems. Unclear from the paper which of the bugs are actually important. Perhaps too incremental? ===== Comments for author ===== I really liked reading this paper, and I think it is well above threshold for OSDI. The paper is a little glib about where device drivers execute - not all drivers execute in the kernel on Linux and BSD (indeed, there is a trend to move them out if possible), and not all operating systems execute them in the kernel. The paper also makes a few unspoken assumptions about who writes drivers - it's aimed squarely at Linux developers, who may not have access to the device itself, let alone any documentation. However, many driver programmers work for hardware companies, and so presumably have complete access to the hardware. Others work for organizations (like Microsoft) with very close relationships with hardware vendors. It would be good to make explicit who SymDrive targets, and how it might (or might not) benefit these other players. That aside, the introduction does a good job of motivating the problem and outlining the existing work on which it builds. One obvious point of reference with regard to execution tracing is the Tralfamadore system from UBC. It's different, but a comparison of the approach with SymDrive's tracing functionality would be constructive. "To the best of our knowledge" is always a good way to disavow knowledge of related work, but you might want to ask some folks at, say, Microsoft or Apple if they do something similar to this. Even if they do, you're still the first to publish it, which is fine in my book, but it would at least put this into broader context. Of course, I'm assuming you work for neither. Sec.2: To be useful, Symdrive must not only find bugs that are hard to find with other tools, but these bugs must be the bugs that matter to the correct operation of the system. More generally, the paper needs to say more about the kinds of bugs that SymDrive finds. For example, if many of the bugs are lack of defensive programming in the event that the device does strange things, that reduces the value of the tool somewhat - a buggy device can always DMA stuff all over the place unless there's a good IOMMU, and you don't seem to model IOMMUs right now. Devices are notorious for doing strange things, but it's not all that frequent, so how serious are these kinds of bugs? There's some confusion about what precisely SymDrive is for. In Sec.1, you design SymDrive for three purposes. In Sec.2, you have two use cases. I think this could be clearer. Kudos for section 2.3 ("Why not Symbolic Execution?"), but you don't say whether you actually address/mitigate any of these problems in the paper. Try to relate this section more to the work at hand (and prior art). Otherwise, the scoping is pretty clear. Any reason you didn't do USB? It seems an obvious choice (more than I2C, for instance), but was there something about it that made it more work? Like many papers that combine automatic analysis with manual annotation or coding, this paper is sometimes a little sketchy on engineering effort required. For example, Sec.3.2: "we created a small kernel module..." - please quantify this, how big is it? How many lines of code? Is it doing anything tricky, or just simple translation/interposition? Similarly, in Sec.3.5: "for complex code that slows testing..." - how frequent is this code, in your experience? How often are the programmer annotations needed? You're very quiet about multiprocessors. This is pretty suspicious. If you can't handle multicore machines properly, be honest and say so - it's still a good paper. If you can, explain how and why it works. Right now, you look like you're sweeping concurrency under the rug. You only admit in Sec.5.1. ("Methodology"!) that it's single-threaded, but again, what does this mean? That you can't handle more than one thread, or that SymDrive can't make use of more than one core? Also, say something about the possible dangers of only (symbolically) executing the interrupt handler on a transition from driver to kernel. What potentially important sequences might this strategy miss? Can you relate this heuristic to the usual complexity explosions that dog model-checking efforts in the presence of concurrency? Similarly, when does your loop elision heuristic fail? Other than checksums, can you relate these scenarios to examples of real driver code? Which coverage mode did you use to find all your bugs? More generally, the two "modes" seem like arbitrary choices in a much richer space of options. Can you lay out this space a bit more? Sec.4.1: I'm not too bothered that SymDrive is neither sound nor complete, but it would be good if you gathered together in one place all the reasons it's not sound, and all the reasons it's not complete. You could also say this a lot earlier in the paper (even though it's already implicit in some earlier sections). Which of the bugs you found with Linux 2.6 were subsequently found independently by Linux folks and fixed prior to 3.3? You need to state this if you wish to back up your claim that SymDrive find bugs that other approaches miss. =========================================================================== OSDI '12 Review #4H Updated Wednesday 11 Jul 2012 6:01:06pm PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 2. Weak reject Reviewer expertise: 2. Some familiarity ===== Paper summary ===== This paper describes a system for finding crash, hang, or API misuse bugs in kernel device drivers. The more specific contribution are two extensions over the previous S2E system that help apply symbolic execution to device drivers. The first is a tool that helps the tester to add calls to (tester-written) checking functions at the beginning and end of a given existing kernel function. The second is a heuristic for choosing execution paths to check next (namely, choosing paths that exit a loop or finish initialization code), which seems to do a good job of guiding the symbolic execution of a driver towards more code coverage. ===== Paper strengths ===== The tool seems to work well, both for Linux and FreeBSD, and finds both previously-known and newly-discovered bugs. ===== Paper weaknesses ===== The bulk of the benefit comes from S2E, and the additional techniques introduced by this paper -- while important for actually running S2E on many device drivers -- don't seem particularly surprising or novel, and don't enable finding any new classes of bugs. (The trick for calling f_check() at the beginning and end of f() seems like a very useful one, but hardly a breakthrough idea that would carry an OSDI paper. The scheduling heuristic for breaking out of loops and making progress in initialization seems more interesting, but that's about it..) ===== Comments for author ===== The first few pages of the paper somewhat over-sell the tool. For instance, the title, "SymDrive: Who Needs a Device to Test a Driver?" begs the answer "Anyone who actually wants their driver to work correctly", and in this vein, the paper doesn't state what properties their system tests for until the first paragraph of section 3 (namely, crash / hang / API misuse bugs). Even in section 2.3 ("Why not Symbolic Execution"), you mention under "Specification" that it's hard to check whether the driver actually works correctly, but it's unclear what you're saying here -- is this an example of a limitation of DDT/S2E that you are going to fix in this paper (which seems to be the case for some of the other bullets in this section, like "Simplicity" or maybe even "Efficiency"), or is this an inherent problem to the entire approach (which seems to apply to this point on "Specification")? It would be interesting to understand what kinds of bugs will your tool miss altogether. In the eval, you mention you didn't encounter false negatives, but how would you know? For instance, your story for symbolic interrupts and symbolic DMA seems a bit weak: you only deliver interrupts at the end of a kernel->driver callback, and you only "perform" symbolic DMA between the driver's calls to allocate and free the DMA region. What if the driver gets the calls to allocate/free DMA regions wrong? What if the driver has a race condition with an interrupt handler? These seem like obvious examples of false negatives (i.e., things your tool will never catch). When you talk about path prioritization (e.g., in the favor-success heuristic), what happens to other paths with low priority? It seems that, since you are unlikely to ever complete all paths anyway, you are effectively discarding all paths that have low priority. Is that true? Or do you ever come back and still try out some of those paths that were low-priority from different branches the initialization code? It would be good to say either way, and if it does discard the paths, calling it prioritization is a bit misleading. Anecdotally, I hear that Redhat's kernel engineers are careful to make sure the kernel crashes at the first possible sign of something having gone wrong, because they would rather their customers have fail-stop behavior rather than risk the possibility of data corruption. To the extent that the rest of the Linux kernel community might share the same world view (I'm not sure if that's accurate or not), does this mean you might get lots of false positives in your tool if drivers keep panic'ing when the symbolic device returns garbage? In your evaluation of load time in table 2, what does it mean for loading to finish? Does this mean you've explored all of the paths (including low priority ones) through the initialization code, or does it mean you've finished exactly one path through the initialization code and got back out to userspace? Minor: in related work, paragraph 2, you say your system improves S2E in four ways, but I only see three examples listed in the following text. =========================================================================== OSDI '12 Review #4I Updated Sunday 15 Jul 2012 11:12:59am PDT --------------------------------------------------------------------------- Paper #4: SymDrive: Who Needs a Device to Test a Driver? --------------------------------------------------------------------------- Overall merit: 3. Weak accept Reviewer expertise: 2. Some familiarity ===== Paper summary ===== This paper represents a non-trivial improvement to S^2E for the purpose of symbolic checking of device driver implementations. The improvements both reduce programmer effort (automatically instrumenting the driver/kernel interface; SymGen) and as well as heuristics for improving performance (and hence viability). They demonstrate their too on both FreeBSD and Linux across a range of I/O busses and device types (incidentally finding a bunch of bugs). ===== Paper strengths ===== This paper represents a real step towards making the promise of symbolic evaluation practical. ===== Paper weaknesses ===== This is decidedly incremental work and there is no "magic" technique here, just solid engineering (I say this somewhat ironically since I'm a fan of good engineering). ===== Comments for author ===== I think your exposition could be crisper about the set of tasks left undone by S^2E and how those omissions drove the design of SymDrive. For example, section 3.5 is all detail about what SymGen does and doesn't spare a few sentences about why you need it. Indeed, I'd put all of this up front where you talk about the contributions. I think the paper would also be stronger if you used the different applications to motivate the tool instead of the current structure which makes it sound like you're just looking for as many uses for the framework as you can. It would be nice if you could be more clear about the classes of checkers that can be written and those that can't. For example, you say that you can't do integer checking or rate detection. Can you explain why more clearly and help the reader to think about the classes of checkers that they could and couldn't write with SymDrive? Given that the goal of this paper is to reduce developer effort, I'm wondering if you might not be able to quantify some of that by looking at bugs reported in the bug DB for Linux/FreeBSD and see how long it took to identify the source of such problems once someone was actively working on it and see how hard the same kind of analysis might be using your tool (assuming that the users of the tool didn't know preciselyfs what problem they were looking for). Most of these are either old/unpopular devices or brand new for the android phone. This is not to say that they should not be checked too, but perhaps the class of bugs you checked for are already well handled by devices for which there is significant investment?