Instructor: Andrea Arpaci-Dusseau,
office: 7375 Computer Sciences
days: Tuesday, Thursday
place: Tuesday: 2310
place: Thursday: 7331
Note the room change!
4:00 - 5:00 Tuesday
11:00 - 12:00 Thursday
Defining interfaces is the most important part of system design. Usually it is also the most difficult... -- Butler Lampson
As systems become composed of more and more systems themselves, it becomes increasingly important to define system interfaces well. Unfortunately, complex systems often provide the wrong interface, thus hiding useful internal information and limiting the functionality provided to systems built on top.
In this seminar course, we will investigate how to adapt when the original developers chose the wrong underlying interfaces. Over the semester, we will consider two different scenarios. In the first part of the course, we will assume the common case in which interfaces cannot be changed; thus, we will learn about strategies for inferring information when the desired interfaces do not exist. In the second part of the course, we will "build" a new operating system with new interfaces; this part of the course is more open-ended, but our general goal will be to expose information that was previously hidden (both to applications and to subsystems within the OS).
This semester is likely to be the only time this course is ever offered. This is a once in a lifetime opportunity!
To make this discussion more concrete, consider a developer writing a memory-intensive application. With this type of application, we all know it is important to avoid thrashing the virtual memory system; thus, developers often structure these applications to adapt to the amount of currently available memory. However, there are two hurdles that must be overcome in current systems. First, many operating systems do not provide an interface for accurately reporting the amount of available memory (especially due to interactions with the file cache). Second, even when the OS does provide this information, the OS does not provide an control interface to ensure that this memory can be allocated atomically (i.e., before another application grabs it). Therefore, a developer interested in this type of information and control over available memory must work around the existing interfaces using covert means.
In the first part of this course, we will study examples in which developers have cleverly worked around limited interfaces. We will begin by understanding some of the techniques that are potentially useful, such as microbenchmarking, fingerprinting, reverse engineering, and self-simulation. We will then explore how these techniques have been applied in a wide selection of case studies including TCP and RED, implicit coscheduling, MS Manners, semantically smart disks, control of server utilization, and breaking cryptosystems.
However, the existing set of extensible systems still do not expose all useful information and enable all types of control. For example, consider one of the more extreme examples: Exokernel. Exokernel has the goal of removing all abstractions and of securely exposing the available hardware resources to applications. However, exokernel still limits information and control. First, exokernel does not explicitly expose the cost of each operation (e.g., fetching a particular page from disk at this time). Second, exokernel multiplexes resource across competing applications in a simple and static manner (e.g., processes may choose to run in fixed time-slices on the CPU). The combined result is that applications cannot easily adapt to changing costs (e.g., by prefetching more pages from disk when disk and memory utilization is low or by running for more consecutive time-slices when they have a large working set loaded).
In the second part of this course we will determine how a new operating system could be built to expose as much information and control as possible. This will be much more open-ended than the first part of the course. We will begin this step by understanding some of the developmemts in extensible systems and microkernels: Synthesis, SPIN, Exokernel, VINO, and Scout. We will then explore how a new OS could expose all information (i.e., policies, internal state, and the cost of operations) to both applications and other subsystems within the OS. We will then study how applications or the OS itself could use this information to make bettter decisions (e.g., applications and/or the OS compare the cost and the benefit of performing an operation and only perform those operations with the greatest win). Thus, we will study the research in the area of exposing cost models, performing cost/benefit analysis, and more general economic computations within systems.
A tentative reading list is available.
One option is that the course could involve two projects. The first project would then be associated with the first half of the course and will involve uncovering new information or control for some existing component (e.g., writing a new fingerprinting routine to extract the layout policy for a RAID system). Students would be free to choose any new information they think would be useful. The second project would involve modifying an OS to directly expose this new information or control and comparing the complexity and performance of both. Students would be encouraged to work in small groups of two.
The second option is to have a single goal that the entire class is working toward. For example, we could all work on having a version of Linux that exposes as much information/control as possible, both to applications/libraries as well as to other subsystems within Linux. The steps here could be: remove as many policies as possible from Linux so that we have a base framework to work with; design and implement straight-forward policies such that applications/subsystems can choose if or when operations occur (these policies should be simple to express); determine how descriptions of the policy should be exposed to other layers; implement layers that adapt to the exposed information. Different groups would be expected to be responsible for different subsystems: for example, CPU scheduling, networking, memory managament, and the file system.
Which project style is chosen will depend upon both student and instructor interest.
| || |
|09/03 Introduction||09/05 System Design (1, 2)|
|09/10 Gray-Box Systems (3)||09/12 Case Study: TCP and RED (22, 23)|
|09/17 Microbenchmarks (4, 5)||09/19 Buffer Cache Fingerprinting (13)|
|09/24 Disk Microbenchmarks (7, 8)||09/26 TCP Fingerprinting (9,10,11,12)|
|10/01 Scheduler Fingerprinting (14)||10/03 Reverse Engineering Instructions(18,19)|
|10/08 SSD (28)||10/10 Implicit Coscheduling + MS Manners (24,25)|
|10/15 Status||10/17 Cryptosystems (29,30)|
|10/22 Visual Proxies (26) (and some Status)||10/24 Summary|
|10/29 Summary||10/31 Project Discussion|
|11/05 Project Discussion||11/07 No class|
|11/12 Open Implementation (38)||11/14 No class|
|11/19 Exokernel (33)||11/21 Exokernel (34)|
|11/26 VINO (35)||11/28 Thanksgiving|
|12/03 SPIN (32)||12/05 u-Kernel (37)|
|12/10 Gray-Box layout (39)||12/12 Wrap-Up|