Andrea Arpaci-Dusseau's Research Summary

Very Short Research Statement

Large new software systems are not built from scratch, but instead leverage existing software. One difficulty with using existing software is that it may not always behave as desired -- it may have performance, reliability, or security problems. My research addresses the problem of building layered systems from existing software that cannot be modified.

Building layered systems can be simplified by treating each layer as a gray box. In a gray-box system, one starts with basic knowledge of how a layer is likely to be implemented; one then builds successively refined models of the layer by observing how the layer responds to inputs at run-time. Gray-box knowledge allows one to innovate in systems when one is unable to change either an interface or the implementation of a layer. Specifically, gray-box knowledge allows one both to acquire information about the internal state of a existing layer and to control its behavior in unexpectedly powerful ways.

We have investigated gray-box systems in three important domains: user-level processes interacting with gray-box commodity operating systems, storage systems (e.g., RAIDs and/or single disks) interacting with gray-box file systems, and virtual machine monitors (VMM) interacting with gray-box operating systems. Across all of these domains we have developed fingerprinting tools to automatically infer and characterize the behavior of existing layers. We have also shown how gray-box knowledge can be used to improve performance, reliability, and security.

Very Long Research Statement

As systems become more complex, are implemented by more developers, and contain more lines of code, it becomes increasingly less likely that any single person can understand how a system behaves. Understanding how our computer systems behave is of utmost importance for developers, administrators, and users -- all must be able to identify when a system is not behaving as expected, whether to fix a bug, re-configure a parameter, or switch to an entirely different system.

The challenge for developers is even higher. Large new software systems are not built from scratch, but instead leverage existing software. One difficulty with using existing software is that it may not always behave as desired -- it may have performance, security, or reliability problems in certain environments. When existing software, and even the interfaces to that software, cannot be changed, then developers must figure out how to use the existing code appropriately (e.g., by either adapting to the existing software or by subtly controlling its behavior).

My research at the University of Wisconsin addresses both the problems of understanding complex systems and of building layered systems from existing software. My thesis is that these problems can be simplified by treating each layer of the system as a gray box. In a gray-box system, one starts with basic knowledge of how a layer is likely to be implemented; one then builds successively refined models of the layer by forming hypotheses and testing them with observations of how the layer responds to inputs at run-time. Gray-box knowledge allows one to innovate in systems when one is unable to change either an interface or the implementation of a layer. Specifically, gray-box knowledge allows one both to acquire information about the internal state of a existing layer and to control its behavior in unexpectedly powerful ways.

My research on gray-box systems can be roughly divided into two areas. First, we have developed a range of fingerprinting tools to automatically infer and characterize the behavior of some subsystem. Second, we have shown how a system can use gray-box knowledge to adapt its behavior to that of existing layers or to control the behavior of existing layers, and thus improve performance, reliability, and security.

I have been investigating layered systems primarily in two domains. The first domain consists of user-level processes interacting with commodity operating systems; in this case, we have assumed that the user processes view the OS as the gray box. The second domain consists of the file system interacting with the storage system, whether a RAID or a single disk; in this domain, we have investigated both the file system viewing the storage system as a gray box as well as the storage system viewing the file system as a gray box. This last instance is the environment in which we have performed our most in-depth research; we refer to this type of storage system as a semantically-smart disk system (SDS).

In this document, I summarize our research on understanding and building layered systems. I first describe our work developing techniques to automatically infer complex system behavior. I then summarize our results from using gray-box knowledge to build new systems.

Fingerprinting Existing Systems

We have been developing techniques for automatically characterizing, or fingerprinting, the behavior of software systems. The fingerprinting software starts with high-level knowledge of how the system is likely to be implemented, and then constructs probes and observes the resulting outputs from that component (e.g., the time required for a particular request). The fingerprinting code can successively refine its hypotheses by performing increasingly specific tests. Fingerprinting helps one to infer why the system is performing as it is and to classify the policies the system is using. For example, fingerprinting could identify that a RAID system is using an LRU replacement policy for its cache. Fingerprinting can also be used to characterize system behavior according to metrics other than performance (e.g., reliability).

We have developed innovative, yet practical, techniques for fingerprinting a variety of systems. These techniques can be roughly divided into three categories of increasing sophistication: those that insert probe operations and measure their completion time, those that make observations from multiple vantage points, and those that also manipulate the behavior of the system. I discuss these three classes of fingerprinting techniques in more detail.

  • Insert probes: Our first set of fingerprinting tools infer properties by generating request probes from a user-level process and then measuring the time for those probes to complete. For example, we first developed Dust, which infers OS buffer cache replacement policies. Specifically, Dust identifies how initial access order, recency of access, frequency of access, and long-term history determine which blocks are replaced from the buffer cache. We have also constructed Shear, which infers key properties of RAIDs, such as the number of disks, chunk size, level of redundancy, and layout scheme. By accessing sets of disk blocks and timing those accesses, Shear can detect which blocks are located on the same disks and thus infer basic properties of block layout; intuitively, sets of reads that are ``slow'' are assumed to be on the same disk, while sets of reads that are ``fast'' are assumed to be on different disks.

  • Observe from multiple vantage points: Our second set of fingerprinting tools make observations from multiple vantage points. In these cases, we have fingerprinted different aspects of local file systems by initiating or observing activity both above and below the file system. First, to identify the data structures used by local file systems, we implemented EOF. The EOF process runs at user-level and works in concert with a semantically-smart disk (SDS); while EOF generates workloads that are expected to modify specific fields of data structures (e.g., file size, data pointers, or modification times), the SDS watches to see which blocks are written and which byte ranges change. Thus, the SDS can infer which blocks and byte ranges correspond to which file system structures. Second, we have developed Semantic Block Analysis (SBA) to infer certain policies of file systems, in particular, the events that trigger a journaling file system to flush data to disk (e.g., when a timer expires or when the journal becomes full). In both cases, we are able to infer properties of the layer in the middle (i.e., the file system) by combining observations from the upper layer (i.e., the user workload) and the lower layer (i.e., the storage system).

  • Modify interactions: Our third class of fingerprinting tools not only observe the system under test, but modify its behavior as well. First, we have fingerprinted the communication protocols and policies of the EMC Centera storage cluster. When analyzing this cluster, we not only observe the network and disk traffic, but we delay specific network packets as well; by observing which subsequent packets are also delayed, we are able to definitively infer dependencies across messages. As in our previous work, we are able also to infer its caching, load-balancing, and replication policies. Second, we have fingerprinted the failure-policies of local file systems by failing disk read and write operations. Previous work analyzing how file systems handle disk failures has failed disk blocks at random; however, if the fault injector has gray-box knowledge of file system data structures and operations, it can fail specific blocks at specific times, drastically reducing the search space.

Fingerprinting tools are useful and practical because they enable users and administrators to understand the actual systems that they are using. We have fingerprinted a variety of commodity systems, from the buffer replacement policies in NetBSD, Linux, Solaris, and HPUX, to the file systems of Linux ext2, ext3, ReiserFS, JFS, NetBSD FFS, and Windows NTFS. In many cases, our tools have revealed interesting problems within the systems. For example, Shear revealed that the RAID-5 mode of a common hardware controller employs a non-optimal left-asymmetric parity placement. and failure policy analysis isolated numerous bugs and illogical inconsistencies in several file systems.

Our research on fingerprinting across these domains has revealed common principles. For example, we have found it useful to ensure that the system under test is operating in its steady-state regime before observing its outputs. We have found statistical techniques are needed to deliver automated and reliable detection, yet graphical depictions are useful for users to interpret the results.

Building New Systems

Due to the amount of time, money, and effort required to build a large software system, most systems leverage some amount of existing software (e.g., the OS). Unfortunately, there are complications when one leverages existing code that cannot be modified: the borrowed code may not have the desired behavior in some environments. In this situation, one can use gray-box knowledge to better operate with the existing code. Specifically, one can either adapt the behavior of new layer to that of the existing layers or one can subtly control the behavior of the existing layers.

Adaptation

When a new layer has gray-box knowledge of how other existing layers behave, the new layer can adapt its own behavior appropriately. In our investigations, we have found that the primary challenge is to infer the current internal state of the existing gray-box layer (e.g., not just the replacement policy of an internal cache, but the specific contents of that cache). The techniques we have developed for inferring internal state fall into three categories, increasing in complexity as one's observations and interactions with the existing gray-box layer decrease. In the first category, one is able to interact with the gray-box layer by sending it probe operations; in the second group, one is not able to probe the gray-box layer, but one can observe all of its inputs; in the final group, one is not able to insert probes or observe all inputs, but one can observe a subset of the outputs from the gray-box layer. I describe these three categories of techniques in more detail.

  • Insert probes: The simplest case for inferring internal state occurs when one is able to probe the gray-box layer by sending it request. In our first work in gray-box systems, we investigated how a user-level process can use gray-box knowledge of the OS to adapt its own behavior. In these case studies, the user-level process begins with high-level assumptions about how the OS behaves and then measures the amount of time for simple probes into the OS (e.g., reading a particular block from a file or accessing another page of memory). In this work, we demonstrated that an application can infer whether a particular file is currently cached, whether a group of files are located near each other on disk, and the amount of free memory. Armed with this state information, the application can then modify the order of the files or the amount of data that it accesses to improve performance.

  • Observe all inputs: When one is able to observe all of the inputs to a gray-box layer, then one can simulate (or model) the internal behavior of that layer to infer its current internal state. We have developed efficient on-line simulations both to infer current state and to predict how the layer will react to future requests. We have applied on-line simulation in a number of scenarios. First, we have shown how a web server (or other memory-intensive application) can simulate the file cache replacement algorithm of the OS in order to predict the contents of the file cache; the web server can then service requests which are expected to hit in the file cache first, improving both average response time and throughput. Second, we have implemented an OS disk scheduler that simulates the disk so that it can better group its requests.

  • Observe partial outputs: The most complex situation occurs when one is able to see only some of the outputs from a gray-box layer. By combining detailed knowledge of how the gray-box layer behaves with these partial observations, one is able to infer the likely state of the gray-box layer. In this context, we have investigated functionality placed in a semantically-smart disk system (SDS). For one case study, we developed X-RAY, an exclusive RAID array caching mechanism for an SDS. X-RAY infers the approximate contents of the file system cache by observing when the file system requests a file from the disk and when the access and update times of files are changed. In a second case study, we developed DGRAID, a gracefully-degrading and quickly-recovering RAID storage array. One way in which DGRAID achieves high availability is by placing logically-related blocks (e.g., the meta-data and data blocks of one file) within a single disk. DGRAID infers which blocks belong to a particular file by observing when specific fields and pointers within blocks are updated on disk. Finally, for a third SDS case study, we show how the liveness of file system blocks can be inferred by the storage system. The ability to identify which blocks are live allows disks to implement a great deal of functionality, including the ability to securely delete old versions of files so that their content cannot be later uncovered.

While performing this research, we have addressed a number of overarching challenges. For example, when inserting probes, a primary challenge is to perform probes that do not change the state of the system. When performing on-line simulation, one of primary challenges is to develop a model of the gray-box layer that is accurate enough to predict internal state, while being simple enough to be used efficiently on-line. Finally, the major challenge we have addressed is dealing with asynchrony within the gray-box layer; that is, the gray-box layer may buffer or reorder its outputs, such that the outputs do not match the current state of the layer.

Control

When building a layered system, if the existing layers do not exhibit the desired behavior, gray-box knowledge can be used in a more radical way: the system can indirectly modify the behavior of existing layers without changing their implementation. As an example, consider the case where a gray-box layer (such as the OS) implements a page replacement policy that is non-optimal for the user workload (e.g., LRU). The user process can influence the OS replacement policy, without changing the OS code, if it knows the internal state of the OS (i.e., which pages are likely to be evicted next) and if it can then access the pages that it does not want evicted, thereby encouraging the OS to keep them resident.

We have implemented a few case studies where one influences the policy of an underlying gray-box layer. For example, we have developed a user-level service that changes how the file system places files and directories on disk by selectively naming, inserting, and deleting files. However, our experience has shown that, given the detailed gray-box knowledge needed to support this type of control, it is useful to expand interfaces slightly to expose more internal information. In our work, we have focused on how to make minimal changes to interfaces, under the theory that one should leverage the existing code base to the fullest extent possible.

One context in which we have explored the benefits of exposing more internal state is within an infokernel, an OS which has been extended to export key abstractions. Services built on either a gray-box OS or an infokernel control the OS in the same manner: by modifying the inputs to the OS and understanding how those new inputs change the internal state and the future behavior of the OS. However, whereas a service built on a gray-box system must perform work to infer the state of the OS, a service on an infokernel can obtain this information directly. On top of an infokernel, we have explored four case studies; we have shown how user-level processes can change the default file cache replacement algorithm, the file layout policy, the disk scheduling algorithm, and the TCP congestion control algorithm.

One of the additional contributions of the infokernel research is in identifying key abstractions that can be exported with little complexity from the OS, yet support a broad range of more precise user-level control. For example, our case studies have shown that a prioritized list is a useful general abstraction for expressing the interesting internal state of the OS and the decisions that the OS will make. We have found that only a few hundred lines of OS code are usually required to export these abstractions and that the abstractions are sufficiently general to capture the policies of different operating systems.

Exploring how gray-box systems can be constructed with absolutely no changes to existing layers makes it easier to then identify the small changes to interfaces that greatly improve system functionality. We have explored the issue of how interfaces should be changed in a variety of contexts. For example, we have investigated the ability to control TCP further by developing icTCP, which allows users to not only observe the internal state of TCP, but to also set some TCP variables in a safe fashion. With this very small extension to TCP Reno, we can implement numerous variants of TCP at user level, such as TCP Vegas, TCP Nice, the Congestion Manager (CM), robust reordering, efficient fast retransmit, and TCP Eifel. We have also explored minimal changes to the interface between file and storage systems in order to improve performance, reliability, and security.

Finally, our experience with gray-box systems has stressed the need for models of system behavior. Complex systems make assumptions about the behavior of their subsystems, beyond those specified by their interfaces; however, for systems to operate correctly, these assumptions must be explicitly stated. As a starting point, we have developed a logical framework for modeling the interaction of a file system with the storage system. This model defines the assumptions that the storage system can make about the file system and can help ensure that on-disk data structures are kept consistent. I believe the value of such models will increase as systems continue to increase in complexity.

Menu


Fall'21 Office Hours
TBA
Office:
7375 Computer Sciences
Email:
dusseau "at" cs.wisc.edu

  • A. Arpaci-Dusseau Home
  • WISDoM
  • CS 402 and Catapult Clubs
  • Scratch Account
  • UW Computer Sciences Dept