Project List

  • Device drivers
    • People don't really understand device driver code. Analyze a set of device drivers (a few network, disk, etc. drivers) to figure out what the code does:
      • How much code runs at interrupt level?
      • How much code runs in response to I/O requests
      • How much code runs in response to configuration requests?
      • How much code runs in response to initialization / shutdown / environment change (e.g. power management) events?
      • Make recommendations on how drivers could be changed as a result of this analysis
      • Build tools for automating this analysis across a large number of drivers

      reading: Linux device drivers
      Solaris device drivers
      Windows device drivers

    • Drivers run in kernel mode, where any bug can crash the system. Nooks solves this problem by running the entire driver in a protection domain. Another approach is to split the driver apart and only run critical pieces in the kernel. Come up with a factorization of code that should be in the kernel (e.g. interrupt handlers, performance sensitive i/o code) and code that need not be in the kernel

      reading: privtrans (where code was factored for security purposes)
      Mach I/O model

    • A large number of driver failures are caused by device failures. CSL has a number of pieces of faulty hardware. Take a few devices and try to understand how the device fails and make a driver that works with the failing device.

      reading: iron file systems
      Solaris hardened drivers
      Intel on hardened drivers

    • Operating systems currently assume that you have a small number of devices attached. In a word of ubiquitous computing, there may be thousands of devices you could potentially use (every mouse, keyboard, and monitor in the building!). How would the OS structures for managing device drivers change? How could applications change to take advantage of these devices, for example having a separate mouse and keyboard per window for collaboration?

      reading: Remote I/O
      Device ensembles

  • File Systems
    • Currently, the only way to scale a file system is to buy a bigger computer and add disks. A nicer solution would be to add another computer and then distribute the files over the two computers. You might want to investigate Samba and NFS as possible protocols for building such a system.

      reading: AFS

  • Reliability
    • Lots of people have multiple computers at home. However, they do not supply increased reliability. If one computer fails, typically lots of work or files are lost. Find a way to use multiple computers in a home / small office to build a reliable system. For example, you could mirror the file systems to each other and provide a way to boot up the other OS should the computer fail

      reading: Symantec LiveState
      Virtual Machine Migration

    • Most people have a single disk in their computer, making them vulnerable to disk failures. Create a file system to provide high reliability on a single disk, for example by storing data multiple places.

      RAID
      Solaris ZFS integrity
      Dell Poweredge integrity

    • Configuration errors are a major source of system downtime and management cost. However, fairly little is known about the nature of configuration data. Survey the configuration data on an operating system to determine its characteristics, such as:
      1. How much configuration data is there
      2. Is it per-user or per machine?
      3. How is it specified? As a script, as key/value pairs, as XML, or as a database?
      Based on this study, measure how critical the configuration data is to the operating system's behavior. When deliberately corrupting or removing configuration data, do the system and its applications still function? Is some configuration state more important than others?

      reading: Chronus
      Strider
      Configuration Validation

    • It is very hard to tell if one system is more reliable than another. Fuzz testing (throwing random data at the input functions) is one approach. Fault injection below the system is another. Find some interesting systems and compare them using a few different metrics.

      reading: fuzz testing
      Ballista
      fault injection
      dependability benchmarking

  • Management
    • Feedback control loops are a promising mechanism for automated performance tuning. Try to automatically control an application, such as Samba or Apache, using this technique.

      reading: feedback control
      Controllable systems

    • There are many different approaches to isolating applications so that they can't interfere with each other, including standard usermode proceses, VServers, BSD Jails, Solaris Containers, the Xen hypervisor, and VMware. Experiment with these to see how they differ in the level of isolation, sharing, and performance.

      reading: Xen
      VServers 1 , VServers 2
      , Jails

  • Security
    • A common problem in security is policy. While a system may have many ways to enforce protection or security boundaries, finding what should be inside or outside the boundaries is different. Find a way to generate interesting and useful security policies automatically, such as monitoring what files are accessed during installation, or by automatically granting access to resources provided by a user (via common dialogs or on the command line).

      Reading: Janus
      MAPbox
      Polaris (Alan Karp)
      Polgen for SELinux
      MSR Strider

    • Virtual machines provide an opportunity to do things like virus scanning, and spyware detection (and removal) off line, avoding interference from the virus or spyware itself. Create a tool using VMware or Xen to provide this service.

      reading: ReVirt

Previous project lists: