UW-Madison
Computer Sciences Dept.

CS739 Spring 2009: Reading List

    Distributed Operating Systems

  1. Survey : Distributed Operating Systems
    Andrew S. Tanenbaum and Robbert Van Renesse
    ACM Computing Surveys, Volume 17, Issue 4 (December 1985)
  2. Sprite vs. Amoeba : A Comparison of Two Distributed Systems: Amoeba and Sprite
    Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, Andrew S. Tanenbaum.
    Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.

    Networked File Systems

  3. NFS
  4. Coda : Disconnected Operation in the Coda File System
    James J. Kistler, M. Satyanarayanan
    13th Symposium on Operating Systems Principles, Asilomar, California, pp. 213-225. October 1991.
    • AFS Background : Scale and Performance in a Distributed File System
      Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., and West, M.J.
      ACM Transactions on Computer Systems, Vol. 6, No. 1, February 1988, pp. 51-81.
  5. LBFS : A Low-Bandwidth Network File System
    Athicha Muthitacharoen, Benjie Chen (MIT), David Mazieres (NYU), SOSP'01
  6. Speculator : Speculative execution in a distributed file system
    Edmund B. Nightingale, Peter M. Chen, Jason Flinn
    Proceedings of the twentieth ACM symposium on Operating systems principles (SOSP'05), pages 191 - 205.

    Ordering of Events

  7. Theory: Time and Order
  8. Analysis of Distributed Systems

    Programming Environments

  9. MapReduce
  10. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
    Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
    European Conference on Computer Systems (EuroSys), Lisbon, Portugal,
    March 21-23, 2007

    Migration

  11. OS Approaches
  12. VMM-based Migration

    Specialized Distributed Services

  13. Porcupine: Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service
    Yasushi Saito, Brian Bershad, and Hank Levy
    17th ACM Symposium on Operating Systems Principles, Dec 1999, Kiawah Island Resort

    Cluster-Based Distributed File Systems

  14. xFS : Serverless Network File Systems
    Tom Anderson, Mike Dahlin, Jeanna Neefe, David Patterson, Drew Roselli, Randy Wang.
    SOSP 15, December 1995.
  15. GoogleFS : The Google File System
    Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
    SOSP'03

    Byzantine Agreement

  16. Byzantine : The Byzantine Generals Problem
    Leslie Lamport, Robert Shostak, and Marshall Pease
    ACM Transactions on Programming Languages and Systems, Vol 4, No. 3, July 1982
  17. FailStop : Byzantine generals in action: Implementing fail-stop processors.
    Fred B. Schneider
    TOCS 2, 2 (May 1984), 145:154
  18. LOCKSS : Preserving Peer Replicas By Rate-Limited Sampled Voting
    Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, Yanto Muliadi
    SOSP'03
  19. Practical : Practical Byzantine Fault Tolerance
    Miguel Castro and Barbara Liskov, MIT
    OSDI'99

    P2P Systems

  20. CFS : Wide-Area Cooperative Storage with CFS
    Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris (MIT), Ion Stoica (UC Berkeley), SOSP'01
  21. Dynamo : Dynamo: Amazon's Highly Available Key-Value Store
    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels
    Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
  22. Pangaea : Taming Aggressive Replication in the Pangaea Wide-Area File System
    Yasushi Saito, Christos Karamanolis, Magnus Karlsson, and Mallik Mahalingam, HP Labs, OSDI'02
  23. SUNDR : Secure Untrusted Data Repository (SUNDR)
    Jinyuan Li, Maxwell Krohn, David Mazières, and Dennis Shasha, New York University, OSDI'04

    Recovery

  24. Microreboot : Microreboot—A Technique for Cheap Recovery
    George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox, Stanford University, OSDI'04

Additional Papers

  1. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
    Yuan Yu, Michael Isard, Dennis Fetterly, and Mihai Budiu, Microsoft Research Silicon Valley; Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey
    OSDI'08
  2. Policy : Exploiting Process Lifetime Distributions for Dynamic Load Balancing
    Mor Harchol-Balter and Allen Downey.
    Proceedings of ACM Sigmetrics '96 Conference on Measurement and Modeling of Computer Systems , (SIGMETRICS 96), May 23-26 1996, Philadelphia, PA.
  3. Analysis -- Pip: Pip: Detecting the Unexpected in Distributed Systems
    Patrick Reynolds, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, Charles Killian, and Amin Vahdat
    Proceedings of the 3rd ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI), San Jose, CA, May 2006.
  4. Petal + Frangipani
  5. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
    Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry M. Levy, John Zahorjan (University of Washington), SOSP'03
  6. Measurement and Analysis of Spyware in a University Environment Stefan Saroiu, Steven D. Gribble, and Henry M. Levy, University of Washington, NSDI'04
  7. Mistakes: Understanding and Dealing with Operator Mistakes in Internet Services
    Kiran Nagaraja, Fábio Oliveira, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen, Rutgers University, OSDI'04
  8. Boxwood : Boxwood: Abstractions as the Foundation for Storage Infrastructure
    John MacCormick, Nick Murphy, Marc Najork, Chandramohan A. Thekkath, and Lidong Zhou, Microsoft Research Silicon Valley, OSDI'04
  9. Sensor : TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks.
    Samuel Madden, Michael Franklin, Joseph Hellerstein, Wei Hong.
    In Proceedings of OSDI, 2002.
  10. FUSE: Lightweight Guaranteed Distributed Failure Notification
    John Dunagan, Microsoft Research; , Nicholas J. A. Harvey, Massachusetts Institute of Technology; Michael B. Jones, Microsoft Research; Dejan Kostic, Duke University; Marvin Theimer and Alec Wolman, Microsoft Research
    OSDI'04
  11. Pastiche: Making Backup Cheap and Easy
    Landon P. Cox, Christopher D. Murray, and Brian D. Noble, University of Michigan, OSDI'02
  12. FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment
    Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer, Microsoft Research, OSDI'02
  13. Ivy: A Read/Write Peer-to-Peer File System
    Athicha Muthitacharoen, Robert Morris, Thomer M. Gil, and Benjie Chen, Massachusetts Institute of Technology, OSDI'02
  14. Paxos Made Simple
    Leslie Lamport
    November 2001
  15. The Part-Time Parliament
    Leslie Lamport
    ACM Transactions on Computer Systems, Vol. 16, No. 2, May 1998
  16. Next Century Challenges: Scalable Coordination in Sensor Networks
    Deborah Estrin, Ramesh Govindan, John Heidemann, Satish Kumar
    Mobile Computing and Networking, 1999.
  17. The LOCUS Distributed Operating System
    Bruce Walker, Gerald Popek, Robert English, Charles Kline, Greg Thiel,
    9th Symposium on Operating Systems Principles (SOSP), Bretton Woods, New Hampshire, November 1983, pp. 49-70.
  18. DEMOS/MP: The Development of a Distributed Operating System
    Barton P. Miller, David L. Presotto, Michael L. Powell,
    Software-Practice & Experience 17 4, April 1987, pp. 277-290.
  19. Plan 9 from Bell Labs
    Rob Pike, David L. Presotto, Sean Dorward, Bob Flandrena, Ken Thompson, Howard Trickey, Phil Winterbottom,
    Computer Systems 8, 3, Summer 1995, pp. 221-254.
  20. The ITC Distributed File System: Principles and Design
    M. Satyanarayanan, John H. Howard, David A. Nichols, Robert N. Sidebotham, Alfred Z. Spector, Michael J. West,
    10th Symposium on Operating Systems Principles (SOSP), Orcas Island, Washington, pp. 35-50. December 1985.
  21. Magp ie : Using Magpie for Request Extraction and Workload Modelling
    Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier, Microsoft Research, Cambridge, UK, OSDI'04
  22. River : Run-Time Adaptation in River
    Remzi H. Arpaci-Dusseau
    Transactions on Computing Systems (TOCS), February, 2003, v. 21:1, pp. 36-86
  23. Linda: The S/Net's Linda Kernel
    N. Carriero and D. Gelernter
    ACM Trans. on Computer Systems 4, 2, May 1986, pp. 110-129.
  24. Survey : Process migration
    Dejan S. Milojicic and Fred Douglis and Yves Paindaveine and Richard Wheeler and Songnian Zhou
    ACM Comput. Surv. 32, 3, 2000.
  25. Scalable, Distributed Data Structures for Internet Service Construction
    Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David Culler , UC Berkeley
    OSDI 2000
  26. RPC : Performance of the Firefly RPC
    M. D. Schroeder and M. Burrows
    ACM Trans. on Computer Systems, 8 1, February 1990, pp. 1-17.
  27. U-Net : U-Net: A User-Level Network Interface for Parallel and Distributed Computing
    Thorsten von Eicken, Anindya Basu, Vineet Buch, Werner Vogels
    Proceedings of the 15th ACM Symposium on Operating Systems Principles, Copper Mountain Resort, Colorado, December 1995, 40-53.
  28. RPC Background: Implementing Remote Procedure Calls
    Andrew D. Birrell, Bruce Jay Nelson,
    ACM Transactions on Computer Systems, 2 1, February 1984, pp. 39-59.

 
Computer Sciences | UW Home