UW-Madison
Computer Sciences Dept.

CS739 Spring 2008: Questions

  1. Survey -- Distributed Operating Systems :
    Andrew S. Tanenbaum and Robbert Van Renesse
    ACM Computing Surveys, Volume 17, Issue 4 (December 1985)
    Question: This paper surveys distributed systems as of 1985. What were the goals of these distributed systems? What were the assumptions (in terms of workload and environment) of these systems? Which design issue (i.e., communication, naming and protection, resource management, fault tolerance, and services) seems most challenging (or interesting)? Why?
  2. Sprite vs. Amoeba : A Comparison of Two Distributed Systems: Amoeba and Sprite
    Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, Andrew S. Tanenbaum.
    Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.
    Question: Amoeba uses the processor pool model. Where did this decision influence aspects of their system? Do you think they made the right choice(s)?
  3. NFS
    Question: Discuss one of the changes that was made to NFSv3 and to NFSv4. What problem did this change address? Does the change introduce any drawbacks or challenges?
  4. Coda : Disconnected Operation in the Coda File System
    James J. Kistler, M. Satyanarayanan
    13th Symposium on Operating Systems Principles, Asilomar, California, pp. 213-225. October 1991.
    Question: In the normal (hoarding) state, how does a Coda client determine which files and directories should be cached? How does the client ensure these files and directories are actually in the cache? Imagine that Coda is modified such that the client (somehow) knows the probability of disconnection in the near future; how might this knowledge influence what the client caches?
  5. LBFS : A Low-Bandwidth Network File System
    Athicha Muthitacharoen, Benjie Chen (MIT), David Mazieres (NYU), SOSP'01
    Question: How does LBFS make a trade-off between reducing bandwidth requirements and potentially increasing latency (i.e., the number of round-trip requests) for both read and write operations? What aspects of the LBFS protocol help keep the number of round-trips "reasonable"?
  6. Centera : Deconstructing Commodity Storage Clusters
    Haryadi Gunawi, Nitin Agrawal, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
    ISCA'05
    Question: This paper shows that one can actively delay packets to determine whether or not a subsequently sent packet is dependent. What are the strengths and weaknesses of this approach for inferring causality? Did this delay technique discover any non-obvious aspects of the Centera write protocol?
  7. DSM: Towards Transparent and Efficient Software Distributed Shared Memory
    D.J. Scales and K. Gharachorloo
    16th Symposium on Operating Systems Principles, Saint Malo, France, October 1997, pp. 157-169.
    Question: This paper is called Transparent and Efficient Software Distributed Memory. Where in the design of Shasta do they take care to provide transparency? Where does Shasta include optimizations for efficient performance? In the trade-off between transparency and efficiency, which do you think Shasta leans towards?
  8. MapReduce : MapReduce: Simplified Data Processing on Large Clusters
    Jeffrey Dean and Sanjay Ghemawat
    OSDI'04
    Question: The MapReduce environment places intermediate files (after the map phase) on the local disk instead of storing them in the Google file system (GFS). We'll read the GFS paper later; but, for comparison, what is important is that GFS performs replication and stripes data across multiple nodes in the cluster. What are the implications of using local disk instead of GFS (for both performance and reliability)?
  9. SpriteMigration -- Transparent Process Migration: Design Alternatives and the Sprite Implementation : meenali
    Fred Douglis and John K. Ousterhout
    Software - Practice and Experience, Volume 21, Number 8, 1991, Pages 757-785.
    Question: The Sprite migration mechanism makes trade-offs between four factors: transparency, residual dependencies, performance, and complexity. What did the Sprite designers choose for each factor? How did their assumptions about their environment and usage scenarios influence each of their decisions?
  10. Porcupine: Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service
    Yasushi Saito, Brian Bershad, and Hank Levy
    17th ACM Symposium on Operating Systems Principles, Dec 1999, Kiawah Island Resort
    Question: Porcupine (and other distributed system services) characterizes state as being either hard state or soft state. What is the difference between the two? What are the advantages of treating some state as soft? Briefly, how does Porcupine recreate each piece of soft state when needed?
  11. GoogleFS : The Google File System
    Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
    SOSP'03
    You can answer either question you choose; the second is really more about MapReduce than GoogleFS.
    Question 1: Where does GoogleFS rely upon soft state and stale information? Discuss the implications and whether or not these appear to be good design decisions.
    OR
    Question 2: How does the MapReduce programming model interact with the Google file system (GFS)? Where does MapReduce use GFS and where does it not? What are the performance and reliability implications of using GFS or not?
  12. Microreboot : Microreboot—A Technique for Cheap Recovery
    George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox, Stanford University, OSDI'04
    The intro of the paper states "This paper presents a practical recovery technique we call microreboot...". Do you think microreboots are practical? Why or why not?
  13. CFS : Wide-Area Cooperative Storage with CFS
    Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris (MIT), Ion Stoica (UC Berkeley), SOSP'01
    Make up and answer your own question related to CFS.
  14. Dynamo : Dynamo: Amazon's Highly Available Key-Value Store
    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels
    Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
    Question: Amazon's key-value storage server, Dynamo, provides services a trade-off between performance, durability, and availability. What are some of the techniques Dynamo uses to improve one of those three metrics? How does it allow services to control the trade-offs?
  15. Pangaea : Taming Aggressive Replication in the Pangaea Wide-Area File System
    Yasushi Saito, Christos Karamanolis, Magnus Karlsson, and Mallik Mahalingam, HP Labs, OSDI'02
    Question: Why does Pangaea have two classes of replicas: gold and bronze? What is the purpose of each (why not just have gold or just have bronze)? How does Pangaea ensure it has enough replicas?

 
Computer Sciences | UW Home