===========================================================================
                            OSDI '12 Review #4A
                Updated Saturday 12 May 2012 11:34:19pm PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 2. Weak reject
                 Reviewer expertise: 3. Knowledgeable

                         ===== Paper summary =====

Scaffolding for doing deeper or broader testing of device drivers using symbolic execution.

                        ===== Paper strengths =====

A tool worth having.

Real bugs (some undiscovered before) found.

Well written paper.

                       ===== Paper weaknesses =====

The depth of the research contribution isn't evident. Although useful, the tool appears like an engineering exercise to extend prior software with a better user interface (driver source code rather than binary), a better debugging interface (checkers written in C running in the kernel context), and generic symbolic execution speedups (new path selection heuristics).

                      ===== Comments for author =====

Although this is a very well written paper, it left me unsatisfied in the end. It's clear that SymDrive does something. But the magnitude of this something is hard to evaluate. For one, as the authors honestly admit, the tool is neither sound nor complete. It takes liberties that prune usable paths (e.g., by concretizing when executing through the kernel), it deprioritizes paths that may hide unwanted behaviors (e.g., in long loops), etc. But in the process, it does find some unwanted behaviors, as the bugs found demonstrate.

So all in all, the tool (and paper) don't close the book shut on device driver testing (admittedly a hard job), nor does it open the book (as shown by the related work). Does it close a chapter on the book? Is there a class of device driver bugs or testing that it does do completely? That doesn't appear to be the case. And without such a meaningful definition of accomplishment, the paper doesn't seem to make quantifiable progress. To be somewhat ungenerous and glib, we can keep writing papers like this forever, and we still won't have solved the problem of establishing the correctness of a device driver.

I would recommend that the authors focus on some aspect of testing for which they can show "complete" progress. It might be that with mild "refactoring" the paper can argue that all driver behavior induced by an arbitrary device is tested against the developer-written checkers. That would be great, because then we can focus on driver-kernel interactions and we'll know how to handle driver-device interactions. Or perhaps the focus could be on a single kernel transaction? There are several choices, but I would feel much more comfortable with the paper if it made a statement of contribution that establishes at least one problem solved.

Some smaller issues:

- Several of the contributions of this paper are rather generic: path prioritization policies for exiting loops or for staying within a function designated to be important, source-to-source rewriters to instrument module-boundary crossing, execution-trace comparison, etc. Those sound neat, and seem to help with the task at hand, but are they all that challenging? If they are, I'd love to know what major challenges the authors had to face.

- I'd like to see an explanation why discarding all device writes is not making testing trivial. It's not obvious to me, not being an expert.

- Is execution tracing in section 3.4 significant? If it is, I'd like to see a more detailed description. What exactly is compared across executions on the same OS, or across OSes? Is there canonicalization required? If not, what records are comparable?

- There are various assertions throughout the paper about how X is difficult with technique A (e.g., static analysis) while it is easier with SymDrive. Although I believe some of these assertions, I'd like to see either explanations or citations for them. For example (at the bottom left of page 9), why is the fact that 41% of bugs occurred on a unique path through a driver make it difficult to find those bugs with static analysis?

Editorial

- Please give a brief definition of "collateral evolution" in this paper.

- Page 5, top left paragraph. If writes have no effect, how are writes seen by subsequent reads? This is confusing.

- Page 6. a user-mode tests -> a user-mode test

- Page 7. "SymGen also instruments ... with opcodes"? They look like functions to me in Figure 2. You could have said "testing functions compiled to unique S2E opcodes".

- Page 8. can it be it used -> can it be used. Figure 2 -> Table 2.

- Page 9. What are the asterisks in the LoC column? The text only explains driver-name asterisks.

- Page 10. this results shows -> this result shows. Also, why is parallel time higher than serial time for 8139too in Table 5? A typo, hopefully.

===========================================================================
                            OSDI '12 Review #4B
                 Updated Tuesday 10 Jul 2012 9:41:03am PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 3. Weak accept
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

This paper describes a driver testing tool that uses symbolic execution to enable testing drivers without a physical device. The contributions of the paper are innovations that make symbolic execution more effective in the specialized context of testing device drivers. The tool relies on a static analysis and source-source translation to minimize instrumentation effort,  development effort in instrumenting driver code for testing, enables kernel-mode execution of checker routines, and provides an execution tracing system that can help identify changes to I/O behavior across multiple implementations.

                        ===== Paper strengths =====

Despite considerable attention from the research community, improving driver reliability remains an important problem, and symbolic execution's conceptual promise of exploring all possible inputs is appealing in this context. This paper takes on several problems that limit the applicability of symbolic execution to drivers: namely, path explosion, the need for additional software artifacts like device and OS models, and lack of a specification of the correct behavior of a driver. The evaluation makes a compelling case that the authors's efforts to address these barriers make symbolic execution a practical tool for device drivers. Anecdotal evidence about developer effort and execution trace comparisons are compelling, and many of the bugs the tool found were significant enough to be fixed in subsequent kernel releases.

                       ===== Paper weaknesses =====

While the system the paper describes appears to be effective, the paper relies on a lot of previous work, and the innovations required to make that work applicable in this environment are not earth-shattering. For example, the system provides static analysis and translation that identifies and instruments entry points into driver code. Such an automation certainly reduces developer effort, but given that drivers typically must implement a known interface to the kernel, finding and instrumenting those entry points is perhaps not a contribution-worthy problem. Enabling kernel-mode checkers (when user-mode checkers are available) to address the lack of correct specifications, and policy tweaks that improve coverage while limiting path explosion in the symbolic execution have a similar flavor.

                      ===== Comments for author =====

* Re: interrupts/DMA: You invoke interrupt handlers on every transition from the driver 
  into the kernel. Do you do this because it's the only way to drive the system to 
  model interrupts raised by the device? It seems reasonable to be skeptical that this
  can capture the interrupt behavior of all devices. It would help the reader if you
  either articulate the argument why this models most devices well, or why the loss of
  fidelity in the model is tolerable if doesn't model most devices well. 
 
* Re: loop-elision: you claim that loop-elision prioritizes paths that exit loops early
  ameliorating the need for an EdgeKiller plugin. However, when SymDrive finds
  a loop that cant be terminated, the programmer is expected to #ifdef
  that code out, which strikes me as being a level of effort that is similar
  to that required to provide an offset for an EdgeKiller plugin. Are you arguing that
  the number of cases requiring such ifdefs is significantly fewer than those
  requiring EdgeKiller interventions? Or that ifdef is easier than finding
  the code offset? 
  
* Re: the list of exceptions SymGen keeps  to deal with kernel functions that return
  non-standard values: it would be great to know (at least approximately) how many 
  such exceptions were required, and assuming that number is small, would help buttress
  your arguments that inferring success through return codes without knowledge of
  the API returning the code is a reasonably general strategy.  
  
* Being able to test all ioctl handling code in one ioctl invocation is pretty cool. 

* Your anecdote about finding a bug in 8139too is interesting, and very cool, but
  it also makes me wonder how much effort was required to find that bug that way. The idea of
  finding hardware-specific bugs by comparing traces seems error prone: diffs in the traces 
  don't necessarily indicate a bug, even if they indicate different I/O behavior. How much 
  time/effort was invested in inspecting diffs in the traces? How did you
  determine that 8139too's differing behavior was in fact, a bug? Did you compare traces for different
  implementations of other drivers, or did you do this only for 830too, and you were lucky 
  to choose one that happened to have a bug? 

* The result of the comparison to S2E is important. It might be worth your while
  to provide the reader with an earlier taste of just how insufficient S2E is for 
  this job (without the modifications you provide).

===========================================================================
                            OSDI '12 Review #4C
                  Updated Friday 1 Jun 2012 3:14:45pm PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 4. Accept
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

Testing drivers is hard, especially if you don't have access
to the hardware, or you can't get it to exhibit all of its
behaviors, especially the unusual error conditions.  SymDrive
is a framework for using symbolic execution to allow testing
of drivers without actual devices.  It builds on previous
work (mostly S2E) that provides a symbolic execution platform,
but SymDrive specialized and extends SE2 so as to make it
much easier and faster to test kernel drivers.  (SymDrive
exploits specific features of the driver domain; it's not
clear if any of the improvements to SE2 generalize beyond
that.)  SymDrive works with both Linux and FreeBSD, and has
found bugs in a bunch of drivers.

                        ===== Paper strengths =====

It's really useful to have a way to test drivers without
having to find ways to get the devices to display all of
their possible behaviors.

SymDrive seems to be a useful extension to S2E.  And this
extension is highly specific (perhaps too specific) to
OS code, so it's on-topic for OSDI.

The paper is pretty easy to read.

                       ===== Paper weaknesses =====

In a few places, I found it hard to understand exactly
what was happening.

It's not clear whether any of the improvements for S2E
can be generalized beyond drivers.

                      ===== Comments for author =====

page 5, loop elision, especially the second paragraph -- I
had a hard time understanding how well this works.  For
example, if you have to run through an entire checksum
loop to see if it gets the right value, does prioritizing
loop-exit paths work?  And if you just #ifdef out this
entire loop, doesn't that undermine the purpose of improved
test coverage?  I would like to think that SymDrive could
be used to test the code that executes on checksum failure.

fig 2: it's kind of hard for a reader to tell which lines were added
and modified by SymDrive.  I had to go find 8319cp.c to figure
this out. I think it would be worth the
extra space to include the original code, which is pretty
short:
        while (work--) {
                if (!(cpr8(Cmd) & CmdReset)) return;
                schedule_timeout_uninterruptible(10);
        }
maybe you can include the original code as a pair of comments?

5.1: "SymDrive does not presently work ... with multicore support."
Does that mean that this is just a small matter of programming,
or is it fundamentally difficult to support (and check) parallelism?

5.1, step 1: I think what you mean is that you replace the
original driver.o with driver_as_modified_by_sym_drive.o, right?

5.2: you found 39 bugs in 26 drivers.  Have you looked at bug
repositories to figure out whether you found 39 of 39 bugs?
or 39 of 1000?  In other words, given all of the effort that
you put into SymDrive, is it finding enough bugs to be worth it?

page 11, execution tracing: this sort of answered the question
I had when I read section 3.4: "how well does this actually
work?" but not entirely.  Earlier you quantifed how long
it took to apply SymDrive to the phantom driver, but here
it's not clear whether the trace-comparison process took
a few minutes or a few days, and how tedious it was to actually
find the bugs in the 8139too and rl drivers.

section 6, DDT and S2E: can you apply your improvements
to S2E to non-driver code?  Perhaps some other domain,
such as network services, where it's hard to get the other
end to exhibit error states?  Or is driver code really
the only domain where your work applies?

===========================================================================
                            OSDI '12 Review #4D
                 Updated Monday 18 Jun 2012 5:59:04pm PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 4. Accept
                 Reviewer expertise: 3. Knowledgeable

                         ===== Paper summary =====

SymDrive specializes the idea of symbolic-execution-driven testing
to the special case of testing device drivers.

                        ===== Paper strengths =====

It's a clever fit:
it obviates the need for a device to be present,
it uses the kernel as its own model,
and loops (which choke SymDrive) are rare.

The evaluation is successful, finding a handful of new bugs, and
discovering 17 known, verified bugs.

                       ===== Paper weaknesses =====

I wonder if the inability to handle loops will put an important class
of failures (around actual data handling) out of reach.

                      ===== Comments for author =====

p1 The abstract mentions 39 bugs, but not all are verified. I suggest instead
trumpeting in particular the 5 new bugs, those the devs agree are real
and were not previously known.

p1 That "two dozen" drivers mention "compile tested only" isn't very
compelling; there must be thousands and thousands of driver checkins
in the denominator. The real question (and hence motivation) is how
many patches don't get committed -- or even written -- because of
the inability to test beyond compile.

p3 "The design applies to any OS supported by S2E" -- how broad is that set?

p4 Tell us about the 1,954 "Changes to S2E". What kinds of changes
are required?

p4 What is the "Platform" bus?  How much work is involved in adding
support for a new bus? How hard would USB be?

p5 "the test framework may inject additional symbolic data to increase
coverage" -- please provide a clearer preview and maybe even a forward
ref to a section number.

p5, end of "favor-success sched" section: am I correct in inferring
that forking cannot cross beyond one function call into the driver?
(perhaps say that explicitly.)

p5, loop elision: why can't you prioritize such that you prune paths
that *don't* loop? That would seem to direct the system towards discovering
bugs along paths that require loops.

p6 "nonstandard values ... SymGen has a list of exceptions" -- please provide
an example or two to clarify.

p8 An inline or footnote definition of collateral evolution will help
some readers (like this one, who had to look it up :v)

p8 "we have not experienced any [false positives]" -- Can you contrive
a case that would at least illustrate where they might come from?

p9 "we tested phantom ... in 1h:45m" -- I see from tbl 2 that phantom is
one of the smallest drivers, and it required zero annotations, so it
seems pretty likely that that this is an ideal case, not a representative
case -- call that out.

p10 "the only changes were to update a few checkers" -- how many LoC was
that, so I can fit it into the context of tbl 1?

p10 "econet requires additional software" -- do you mean that it comes
with some userland software that can push the driver into states that
SymDrive can't find on its own? Or something else? Please clarify.

p10 some "required kernel-mode tests" -- please give an example to clarify.

p10 "this results" -- extra s
p10 "this result shows that SymDrive can provide coverage similar to or
better than that of running the driver on real hardware" -- this sentence
is a little too generous to SymDrive; the effort to generate full coverage
in this case wasn't very enthusiastic compared with a driver dev trying to
push up gcov's numbers.

p12 "SymDrive automatically detects and annotates loops" -- and punts
them out of the evaluation, of course.


Review by Jon Howell <howell@microsoft.com>, intentionally unblinded.

===========================================================================
                            OSDI '12 Review #4E
                Updated Saturday 30 Jun 2012 3:41:37pm PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 3. Weak accept
                 Reviewer expertise: 4. Expert

                         ===== Paper summary =====

The paper describes SymDrive, a symbolic execution framework for testing
device drivers.  SymDrive uses static analysis to identify and instrument
(1) driver-kernel interface and (2) loops in driver code.  This
instrumentation is later used by symbolic execution to guide the search of
paths, run checkers, and mark additional data as symbolic.  Two search
heuristics are described, one favors successful paths to execute driver
code past initialization, and another favors loop exit paths to terminate
loops early.  The checkers are written in C code and runs within the
checked code, instead of the symbolic execution engine.  SymDrive SymDrive
also provides a tool to log hardware I/O traces and compare the traces.
symbolic execution uses KLEE.  SymDrive's static analysis uses CIL.
Evaluation on 26 drivers yielded 39 bugs and good statement coverage.

                        ===== Paper strengths =====

The paper is well written, providing a detailed and honest account of what
was done in the design, implementation, and evaluation of SymDrive, likely
sufficient for someone interested to reproduce the SymDrive system.  The
paper also gives fair credits to prior work.

SymDrive is a well engineered system.  Its error detection and statement
coverage results are also good.

                       ===== Paper weaknesses =====

Although SymDrive leverages a few good existing ideas, I find it a bit
thin on key new ideas.  For instance, SymDrive's static analysis to
discover driver-kernel interface and loops is relatively straightforward.
Most rules SymDrive checks are already checked by prior work.

                      ===== Comments for author =====

Once source code or a high-level intermediate representation of the code
is available, automatically discovering loops seems easy: simply find
cycles in the control flow graph or, even simpler, iterate through
llvm::Loop structures.  Discovering kernel-driver interface also appears
easy.  I'd like to see the hard problems solved by your static analysis to
better understand the challenges.

How hard is it to come up with the rules you check?  Can any of the rules
be automatically inferred from the paths you explore?  For instance, if 99
out of 100 paths call foo()->bar(), and the other path doesn't, then this
path may contain a bug.  If rules can be automatically inferred, users of
SymDrive don't have to have domain knowledge or write the checkers.  There
seem to be some interesting research challenges here.

I'm curious how much the QEMU + KLEE approach buys you.  If you start off
with compiling the driver and kernel code to LLVM, and then simply use
KLEE, would it give you more bugs and coverage etc?  Going from source
code to x86 assembly back to LLVM bitcode seems rather convoluted, and
potentially making constraints unnecessarily complex.

Sec 4.1 says you don't check for overflows.  KLEE can check overflows by
solving constraints to make a symbolic expression overflow, except it may
find too many benign overflows.  Is this the reason you didn't check for
overflows?

Sec 5.3 What are the annotations you added?  If the number is small, you
may describe all the annotations, possibly in a table.

===========================================================================
                            OSDI '12 Review #4F
                 Updated Friday 6 Jul 2012 11:22:50pm PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 3. Weak accept
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

This paper presents SymDrive, a system for testing Linux and FreeBSD
drivers using symbolic execution. In addition, SymDrive uses static
analysis to reduce the testing effort, enables the developer to write
checkers in C, and provides an execution-tracing tool to efficiently
test patches. By using SymDrive, the authors found 39 bugs across 21
Linux drivers and 5 FreeBSD drivers.

                        ===== Paper strengths =====

Despite recent work, debugging drivers remains a significant
problem. This paper advances the state of art in addressing this
problem. The solution is very well crafted, and the evaluation is
comprehensive.

                       ===== Paper weaknesses =====

While effective, the solution seems a rather incremental
improvement over previous work, in particular, over S2E.

                      ===== Comments for author =====

I enjoyed reading this paper. The paper is well written, the solution
is well thought out, and the evaluation is impressive. All in all, a very
well executed paper!

I have only a few comments. First, while I understand the difficulty of
the problem and the fact that this paper is advancing the state of art, 
I still feel a bit disappointed about the limitations of the solution. In 
particular, SymDrive is neither provable sound nor complete. SymDrive 
does not detect bugs due to race conditions (between threads), 
which are ones of the hardest and time consuming bugs.

Second, SymDrive is leveraging a collection of existing techniques, 
rather than proposing a new technique or idea. In fact, one could
argue that the improvements over S2E are rather incremental. 

Third, sometimes the paper overwhelms the reader with details, at the
expense of providing a higher level picture, and emphasizing the main
ideas and tradeoffs. For example, Section 3 is a long list of low
level technical details which I found a bit hard to follow. Also, it
would have been nice to evaluate the impact of the various technique
you use, i.e., static analysis, ability to write checkers in C, and 
execution tracing tools. Finally, it would have been nice to evaluate 
the impact of various prioritization techniques for path exploration.

Other comments & questions:

- sec 5.1: "If SymDrive reports warnings about too many paths, we
   annotate the driver code and repeat the operations." - How often
   did you have to do such annotations?

- sec 5.5: I like the patch application. You should emphasize it 
  earlier in the paper.

- sec 5.6: "We ran it until it started thrashing the page file, with a
   commensurate drop in CPU utilization, for a total of 23 minutes." -
   Would increasing the memory size have made a significant
   difference?

===========================================================================
                            OSDI '12 Review #4G
                Updated Wednesday 11 Jul 2012 2:45:11am PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 4. Accept
                 Reviewer expertise: 3. Knowledgeable

                         ===== Paper summary =====

Applying symbolic execution to debugging device drivers, without
having a device.  Naturally, if you symbolically execute the driver
with all possible results from device registers, DMA operations, and
interrupts, there's no need for real hardware - and you'll get more
code coverage.

                        ===== Paper strengths =====

Very nice insight (no hardware!), and something I for one might find
extremely useful.  A pretty significant chunk of tool code as well,
even taking into account the previous work (S2E) which it relies on.

                       ===== Paper weaknesses =====

Yet another paper on applying symbolic execution to operating
systems.  Unclear from the paper which of the bugs are actually
important.   Perhaps too incremental?

                      ===== Comments for author =====

I really liked reading this paper, and I think it is well above
threshold for OSDI. 

The paper is a little glib about where device drivers execute - not
all drivers execute in the kernel on Linux and BSD (indeed, there is a
trend to move them out if possible), and not all operating systems
execute them in the kernel. 

The paper also makes a few unspoken assumptions about who writes
drivers - it's aimed squarely at Linux developers, who may not have
access to the device itself, let alone any documentation.  However,
many driver programmers work for hardware companies, and so presumably
have complete access to the hardware.  Others work for organizations
(like Microsoft) with very close relationships with hardware vendors.
It would be good to make explicit who SymDrive targets, and how it
might (or might not) benefit these other players. 

That aside, the introduction does a good job of motivating the problem
and outlining the existing work on which it builds. 

One obvious point of reference with regard to execution tracing is the
Tralfamadore system from UBC.  It's different, but a comparison of the
approach with SymDrive's tracing functionality would be constructive. 

"To the best of our knowledge" is always a good way to disavow
knowledge of related work, but you might want to ask some folks at,
say, Microsoft or Apple if they do something similar to this.  Even if
they do, you're still the first to publish it, which is fine in my
book, but it would at least put this into broader context.  Of course,
I'm assuming you work for neither. 

Sec.2: To be useful, Symdrive must not only find bugs that are hard to
find with other tools, but these bugs must be the bugs that matter to
the correct operation of the system.   More generally, the paper needs
to say more about the kinds of bugs that SymDrive finds.  

For example, if many of the bugs are lack of defensive programming in
the event that the device does strange things, that reduces the value
of the tool somewhat - a buggy device can always DMA stuff all over
the place unless there's a good IOMMU, and you don't seem to model
IOMMUs right now.  Devices are notorious for doing strange things, but
it's not all that frequent, so how serious are these kinds of bugs?

There's some confusion about what precisely SymDrive is for.  In
Sec.1, you design SymDrive for three purposes.  In Sec.2, you have two
use cases.  I think this could be clearer. 

Kudos for section 2.3 ("Why not Symbolic Execution?"), but you don't
say whether you actually address/mitigate any of these problems in the
paper.  Try to relate this section more to the work at hand (and prior
art).

Otherwise, the scoping is pretty clear. 

Any reason you didn't do USB?  It seems an obvious choice (more than
I2C, for instance), but was there something about it that made it more
work? 

Like many papers that combine automatic analysis with manual
annotation or coding, this paper is sometimes a little sketchy on
engineering effort required.  For example, Sec.3.2: "we created a
small kernel module..." - please quantify this, how big is it? How
many lines of code?  Is it doing anything tricky, or just simple
translation/interposition?

Similarly, in Sec.3.5: "for complex code that slows testing..." - how
frequent is this code, in your experience?  How often are the
programmer annotations needed?  

You're very quiet about multiprocessors.  This is pretty suspicious.
If you can't handle multicore machines properly, be honest and say so
- it's still a good paper.  If you can, explain how and why it works.
Right now, you look like you're sweeping concurrency under the rug.
You only admit in Sec.5.1. ("Methodology"!) that it's single-threaded,
but again, what does this mean?  That you can't handle more than one
thread, or that SymDrive can't make use of more than one core? 

Also, say something about the possible dangers of only (symbolically)
executing the interrupt handler on a transition from driver to
kernel.  What potentially important sequences might this strategy
miss?  Can you relate this heuristic to the usual complexity
explosions that dog model-checking efforts in the presence of
concurrency?

Similarly, when does your loop elision heuristic fail?  Other than
checksums, can you relate these scenarios to examples of real driver
code? 

Which coverage mode did you use to find all your bugs?

More generally, the two "modes" seem like arbitrary choices in a much
richer space of options. Can you lay out this space a bit more?

Sec.4.1: I'm not too bothered that SymDrive is neither sound nor
complete, but it would be good if you gathered together in one place
all the reasons it's not sound, and all the reasons it's not
complete.  You could also say this a lot earlier in the paper (even
though it's already implicit in some earlier sections). 

Which of the bugs you found with Linux 2.6 were subsequently found
independently by Linux folks and fixed prior to 3.3?  You need to
state this if you wish to back up your claim that SymDrive find bugs
that other approaches miss.

===========================================================================
                            OSDI '12 Review #4H
                Updated Wednesday 11 Jul 2012 6:01:06pm PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 2. Weak reject
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

This paper describes a system for finding crash, hang, or API misuse bugs in kernel device drivers.  The more specific contribution are two extensions over the previous S2E system that help apply symbolic execution to device drivers.  The first is a tool that helps the tester to add calls to (tester-written) checking functions at the beginning and end of a given existing kernel function.  The second is a heuristic for choosing execution paths to check next (namely, choosing paths that exit a loop or finish initialization code), which seems to do a good job of guiding the symbolic execution of a driver towards more code coverage.

                        ===== Paper strengths =====

The tool seems to work well, both for Linux and FreeBSD, and finds both previously-known and newly-discovered bugs.

                       ===== Paper weaknesses =====

The bulk of the benefit comes from S2E, and the additional techniques introduced by this paper -- while important for actually running S2E on many device drivers -- don't seem particularly surprising or novel, and don't enable finding any new classes of bugs.  (The trick for calling f_check() at the beginning and end of f() seems like a very useful one, but hardly a breakthrough idea that would carry an OSDI paper.  The scheduling heuristic for breaking out of loops and making progress in initialization seems more interesting, but that's about it..)

                      ===== Comments for author =====

The first few pages of the paper somewhat over-sell the tool.  For instance, the title, "SymDrive: Who Needs a Device to Test a Driver?" begs the answer "Anyone who actually wants their driver to work correctly", and in this vein, the paper doesn't state what properties their system tests for until the first paragraph of section 3 (namely, crash / hang / API misuse bugs).  Even in section 2.3 ("Why not Symbolic Execution"), you mention under "Specification" that it's hard to check whether the driver actually works correctly, but it's unclear what you're saying here -- is this an example of a limitation of DDT/S2E that you are going to fix in this paper (which seems to be the case for some of the other bullets in this section, like "Simplicity" or maybe even "Efficiency"), or is this an inherent problem to the entire approach (which seems to apply to this point on "Specification")?

It would be interesting to understand what kinds of bugs will your tool miss altogether.  In the eval, you mention you didn't encounter false negatives, but how would you know?  For instance, your story for symbolic interrupts and symbolic DMA seems a bit weak: you only deliver interrupts at the end of a kernel->driver callback, and you only "perform" symbolic DMA between the driver's calls to allocate and free the DMA region.  What if the driver gets the calls to allocate/free DMA regions wrong?  What if the driver has a race condition with an interrupt handler?  These seem like obvious examples of false negatives (i.e., things your tool will never catch).

When you talk about path prioritization (e.g., in the favor-success heuristic), what happens to other paths with low priority?  It seems that, since you are unlikely to ever complete all paths anyway, you are effectively discarding all paths that have low priority.  Is that true?  Or do you ever come back and still try out some of those paths that were low-priority from different branches the initialization code?  It would be good to say either way, and if it does discard the paths, calling it prioritization is a bit misleading.

Anecdotally, I hear that Redhat's kernel engineers are careful to make sure the kernel crashes at the first possible sign of something having gone wrong, because they would rather their customers have fail-stop behavior rather than risk the possibility of data corruption.  To the extent that the rest of the Linux kernel community might share the same world view (I'm not sure if that's accurate or not), does this mean you might get lots of false positives in your tool if drivers keep panic'ing when the symbolic device returns garbage?

In your evaluation of load time in table 2, what does it mean for loading to finish?  Does this mean you've explored all of the paths (including low priority ones) through the initialization code, or does it mean you've finished exactly one path through the initialization code and got back out to userspace?

Minor: in related work, paragraph 2, you say your system improves S2E in four ways, but I only see three examples listed in the following text.

===========================================================================
                            OSDI '12 Review #4I
                 Updated Sunday 15 Jul 2012 11:12:59am PDT
---------------------------------------------------------------------------
         Paper #4: SymDrive: Who Needs a Device to Test a Driver?
---------------------------------------------------------------------------

                      Overall merit: 3. Weak accept
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

This paper represents a non-trivial improvement to S^2E for the purpose
of symbolic checking of device driver implementations.  The
improvements both reduce programmer effort (automatically instrumenting
the driver/kernel interface; SymGen) and as well as heuristics for
improving performance (and hence viability).  They demonstrate their
too on both FreeBSD and Linux across a range of I/O busses and device
types (incidentally finding a bunch of bugs).

                        ===== Paper strengths =====

This paper represents a real step towards making the promise of
symbolic evaluation practical.

                       ===== Paper weaknesses =====

This is decidedly incremental work and there is no "magic" technique
here, just solid engineering (I say this somewhat ironically since I'm
a fan of good engineering).

                      ===== Comments for author =====

I think your exposition could be crisper about the set of tasks left
undone by S^2E and how those omissions drove the design of SymDrive.
For example, section 3.5 is all detail about what SymGen does and
doesn't spare a few sentences about why you need it.  Indeed, I'd put
all of this up front where you talk about the contributions.

I think the paper would also be stronger if you used the different
applications to motivate the tool instead of the current structure
which makes it sound like you're just looking for as many uses for the
framework as you can.

It would be nice if you could be more clear about the classes of
checkers that can be written and those that can't.  For example, you
say that you can't do integer checking or rate detection.  Can you
explain why more clearly and help the reader to think about the
classes of checkers that they could and couldn't write with SymDrive?

Given that the goal of this paper is to reduce developer effort, I'm
wondering if you might not be able to quantify some of that by looking
at bugs reported in the bug DB for Linux/FreeBSD and see how long it
took to identify the source of such problems once someone was actively
working on it and see how hard the same kind of analysis might be
using your tool (assuming that the users of the tool didn't know
preciselyfs what problem they were looking for).

Most of these are either old/unpopular devices or brand new for the
android phone.  This is not to say that they should not be checked
too, but perhaps the class of bugs you checked for are already well
handled by devices for which there is significant investment?