Citation: V. Pai, P. Druschel, and W. Zwaenepoel, "IO-Lite: A Unified I/O Buffering and Caching System", 3rd Symposium on Operating System Design and Implementation, New Orleans, February 1999. * Summary IO-Lite is a unified I/O buffering and caching system for general purpose operating systems. It allows applications, interprocess communication, the filesystem, the file cache, and the network subsystem to share a single physical copy of the data safely and concurrently. IO-Lite eliminates all copying and multiple buffering of I/O data, and enables various cross- subsystem optimizations. * Why We Need IO-Lite? For many users, the perceived speed of computing is increasing dependent on the performance of networked server systems. However, general purpose operating systems do not provide sufficient support for high performance server applications. One of the major problems is lack of integration among the various I/O subsystems and the application, each of which typically uses its own buffering and caching mechanisms. This leads to repeated data copying, multiple buffering, and other performance-degrading anomalies. The primary goal of IO-Lite is to improve the performance of server applications such as those running on networked servers, and other I/O intensive applications. * IO-Lite Design IO-Lite uses immutable buffers and buffer aggregates. All I/O data buffers are immutable, such that after initialization they may not be modified. This implies a read-only sharing model, eliminating problems of synchronization, protection, consistency, and fault isolation among OS subsystems and applications. Obviously, the price of immutable buffers is data cannot be modified in place. To alleviate the impact of this restriction, all data buffers are encapsulated inside buffer aggregates, which are instances of an ADT that represent I/O data. All OS subsystems access data through this abstraction. Data contained in a buffer aggregate is not necessarily in contiguous storage. Rather, buffer aggregates contain an ordered list of pairs that represent contiguous sections of immutable buffers. Buffer aggregates support operations for truncating, prepending, appending, concatenation, splitting, and mutating data. Although buffer aggregates are passed by value amond subsystems, the underlying immutable buffers are passed by reference. Conventional access control ensures that a process can only access I/O buffers associated with buffer aggregates explicitly passed to that process. A system-wide reference counting mechanism for I/O buffers allows safe-reclamation of unused buffers. * Interprocess Communication To support caching in a unified buffer system, and IPC mechanism must allow safe concurrent sharing of buffers among different protection domains. IO-Lite uses an IPC mechanism similar to fbufs to support safe concurrent sharing. IO-Lite extends the fbuf approach from the network subsystem to the filesystem, including the file data cache. It also adapts the fbuf approach to a general purpose operating system. IO-Lite IPC combines page remapping and shared memory. When a buffer is initially transferred, VM mappings are update to grant the receiving process read access. When a buffer is deallocated, the mappings still persist and the buffer is added to a free pool for the associated I/O stream * Access Control and Allocation IO-Lite maintains pools of buffers with the same ACL. The choice of a pool from which a new buffer is allocated determines the ACL of the data stored in the buffer. The access control model requires apps to determine the ACL of an I/O data object prior to storing it in main memory. IO-Lite buffers are allocated in a region of the virtual address space called the IO-Lite window. The IO-Lite window appears in the address spaces of all applications and the kernel. Buffers always consists of an integral number of virtually contiguous VM pages, and pages share the same access control attributes. Buffer aggregates contain a list of tuples representing slices. Slices are always fully-contained within a single IO-Lite buffer, but slices may overlap. In order to not waste memory, data objects with the same ACL can be combined in a single IO-Lite buffer and on the same page. * Application Interface IO-Lite provides an extended I/O API that is based on buffer aggregates to application programs. IOL_read and IOL_write are the two core calls. IOL_read takes a standard file descriptor and size, and returns a buffer aggregate containing at most the amount of data given as an argument. IOL_write takes a file descriptor and a buffer aggregate, and replaces the data in an external data object with that of the buffer aggregate parameter. * Filesytem Interaction In IO-Lite, buffer aggregates form the basis of the filesystem cache. The rest of the filesystem remains unchanged. The IO-Lite file cache consists of a data structure that maps triples of the form to buffer aggregates that contain the corresponding data extents. * Network Interaction The network subsystem uses buffer aggregates to store and manipulate network packets. However, in order to meet the requirement that the ACL of a data object must be determined before storing it, network drivers must perform packet filtering in order to identify its associated I/O stream, a process known as early demultiplexing. * Cache Replacement and Paging Cache replacement in a unified caching/buffering system is different from that done for conventional file caches, since cached data is potentially concurrrently accessed by applications. Thus, replacement must consider both references to a cache entry as well as virtual memory accesses to the buffers associated with that entry. IO-Lite uses a simple stategy for selecting cache victims. The cache entries are maintained in a list ordered first by current use, then by time of last access. When a cache entry needs evicted, the least recently used among the not referenced cache entries is chosen, if one exists, else the least recently used among the currently referenced entries is chosen. * Impacts of Immutable Buffers All modifications to data objects stored in buffer aggregates require storing the new values in a newly allocated buffer. If every word in the data object is modified, the only additional cost is the buffer allocation. If only a subset of the words are modified, the new data is stored to a new buffer, and the buffer is logically chained using buffer aggregate operations. When modifications to a data object are so widely scattered that the costs of chaining and indexing exceed the cost of a redundant copy of the entire object, contiguous storage and in-place modification is a must. To support this case, IO-Lite allows mmap'ing of data objects, allowing in-place data modification. * Cross-Subsystem Optimizations Unified buffering/caching enables certain optimizations across apps and OS subsystems not possible in conventional I/O systems. Such optimizations leverage the ability to uniquely identify a particular I/O data object throughout the system. An example of an optimization of this type is the Internet checksum used by TCP and UDP. With IO-Lite, this checksum can be computed for each slice of a buffer aggregate and cached, such that future transmits of the same slice can reused the cached checksum. To support such optimizations, IO-Lite provides a generation number for each buffer that is incremented every time a buffer is re-allocated. This number, when combined with the buffer's address, provides a system-wide unique id for the contents of the buffer.