CS 736: Advanced Operating Systems

Basic Information

When: Tues/Thur 11:00 am - 12:15 pm
Where: 1289 Computer Sciences
Who: Professor Michael Swift
Office Hours: Tuesdays at 1:30-2:30 pm in my office (room 7369)

Notes

Second midterm is in-class on Thursday, 4/27.

Reading for Tuesday, April 18:
Terra: a virtual machine-based platform for trusted computing

Reading for Thursday, April 20:
Why Information Security is Hard -- An Economic Perspective
Reflections on Trusting Trust

Project List

Device drivers

People don't really understand device driver code. Analyze a set of device drivers (a few network, disk, etc. drivers) to figure out what the code does:
- How much code runs at interrupt level?
- How much code runs in response to I/O requests
- How much code runs in response to configuration requests?
- How much code runs in response to initialization / shutdown / environment change (e.g. power management) events?
- Make recommendations on how drivers could be changed as a result of this analysis
- Build tools for automating this analysis across a large number of drivers
reading: Linux device drivers
Solaris device drivers
Windows device drivers
Drivers run in kernel mode, where any bug can crash the system. Nooks solves this problem by running the entire driver in a protection domain. Another approach is to split the driver apart and only run critical pieces in the kernel. Come up with a factorization of code that should be in the kernel (e.g. interrupt handlers, performance sensitive i/o code) and code that need not be in the kernel
reading: privtrans (where code was factored for security purposes)
Mach I/O model
A large number of driver failures are caused by device failures. CSL has a number of pieces of faulty hardware. Take a few devices and try to understand how the device fails and make a driver that works with the failing device.
reading: iron file systems
Solaris hardened drivers
Intel on hardened drivers
Operating systems currently assume that you have a small number of devices attached. In a word of ubiquitous computing, there may be thousands of devices you could potentially use (every mouse, keyboard, and monitor in the building!). How would the OS structures for managing device drivers change? How could applications change to take advantage of these devices, for example having a separate mouse and keyboard per window for collaboration?
reading: Remote I/O
Device ensembles

File Systems

Currently, the only way to scale a file system is to buy a bigger computer and add disks. A nicer solution would be to add another computer and then distribute the files over the two computers. You might want to investigate Samba and NFS as possible protocols for building such a system.
reading: AFS

Reliability

Lots of people have multiple computers at home. However, they do not supply increased reliability. If one computer fails, typically lots of work or files are lost. Find a way to use multiple computers in a home / small office to build a reliable system. For example, you could mirror the file systems to each other and provide a way to boot up the other OS should the computer fail
reading: Symantec LiveState
Virtual Machine Migration
Most people have a single disk in their computer, making them vulnerable to disk failures. Create a file system to provide high reliability on a single disk, for example by storing data multiple places.
RAID
Solaris ZFS integrity
Dell Poweredge integrity
Configuration errors are a major source of system downtime and management cost. However, fairly little is known about the nature of configuration data. Survey the configuration data on an operating system to determine its characteristics, such as:
1. How much configuration data is there
2. Is it per-user or per machine?
3. How is it specified? As a script, as key/value pairs, as XML, or as a database?
Based on this study, measure how critical the configuration data is to the operating system's behavior. When deliberately corrupting or removing configuration data, do the system and its applications still function? Is some configuration state more important than others?
reading: Chronus
Strider
Configuration Validation
It is very hard to tell if one system is more reliable than another. Fuzz testing (throwing random data at the input functions) is one approach. Fault injection below the system is another. Find some interesting systems and compare them using a few different metrics.
reading: fuzz testing
Ballista
fault injection
dependability benchmarking

Management

Feedback control loops are a promising mechanism for automated performance tuning. Try to automatically control an application, such as Samba or Apache, using this technique.
reading: feedback control
Controllable systems
There are many different approaches to isolating applications so that they can't interfere with each other, including standard usermode proceses, VServers, BSD Jails, Solaris Containers, the Xen hypervisor, and VMware. Experiment with these to see how they differ in the level of isolation, sharing, and performance.
reading: Xen
VServers 1 , VServers 2
, Jails

Security

A common problem in security is policy. While a system may have many ways to enforce protection or security boundaries, finding what should be inside or outside the boundaries is different. Find a way to generate interesting and useful security policies automatically, such as monitoring what files are accessed during installation, or by automatically granting access to resources provided by a user (via common dialogs or on the command line).
Reading: Janus
MAPbox
Polaris (Alan Karp)
Polgen for SELinux
MSR Strider
Virtual machines provide an opportunity to do things like virus scanning, and spyware detection (and removal) off line, avoding interference from the virus or spyware itself. Create a tool using VMware or Xen to provide this service.
reading: ReVirt

Previous project lists:

UW Global Navigation

University of Wisconsin-Madison

CS 736: Advanced Operating Systems

Basic Information

Notes

Project List

Menu

Page footer

Copyright