Cluster I/O with River: Making the Fast Case Common


Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft,
David E. Culler, Joseph M. Hellerstein, David Patterson, and Katherine Yelick

We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case - even in the face of non-uniformities in hardware, software, and workload. River is based on two simple design features: a high-performance distributed queue, and a storage redundancy mechanism called graduated declustering. We have implemented a number of data-intensive applications on River, which validate our design with near-ideal performance in a variety of non-uniform performance scenarios.
Full paper: Postscript