CS739 Spring 2008 Questions

CS739 Spring 2009: Questions

Survey -- Distributed Operating Systems :
Andrew S. Tanenbaum and Robbert Van Renesse
ACM Computing Surveys, Volume 17, Issue 4 (December 1985)
Question: This paper surveys distributed systems as of 1985. What were the goals of these distributed systems? What were the assumptions (in terms of workload and environment) of these systems? Which design issue (i.e., communication, naming and protection, resource management, fault tolerance, and services) seems most challenging (or interesting)? Why?
Sprite vs. Amoeba : A Comparison of Two Distributed Systems: Amoeba and Sprite
Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, Andrew S. Tanenbaum.
Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.
Question: Do you think system projects should pay attention to technology trennds? Why or why not? Do you think Amoeba or Sprite did a better job predicting future technology? Explain.
NFS
Question: Discuss one of the changes that was made to NFSv3 and one of the changes made to NFSv4. What problem did each change address? Does the change introduce any drawbacks or challenges?
Coda : Disconnected Operation in the Coda File System
James J. Kistler, M. Satyanarayanan
13th Symposium on Operating Systems Principles, Asilomar, California, pp. 213-225. October 1991.
Question: Coda is derived from AFS. What aspects of AFS simplify the design of Coda? Imagine you were instead building Coda on top of NFS; what aspects of Coda would be easier or harder (e.g., within Hoarding, Emulation, and Reintegration)? Make sure you specify which version of NFS you are assuming!
LBFS : A Low-Bandwidth Network File System
Athicha Muthitacharoen, Benjie Chen (MIT), David Mazieres (NYU), SOSP'01
Question: Create your own...
Speculator : Speculative execution in a distributed file system
Edmund B. Nightingale, Peter M. Chen, Jason Flinn
Proceedings of the twentieth ACM symposium on Operating systems principles (SOSP'05), pages 191 - 205.
Imagine you are a system administrator who needs to decide whether to deploy SpecNFS (the speculative version of NFSv3) or NFSv4. What are the pros and cons of each version? Which would you choose and why? (You can assume a reputable implementation of each version exists.)
Analysis1 :

Black-Box : Performance Debugging for Distributed Systems of Black Boxes
Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, Athicha Muthitacharoen
(HP Labs, Duke, and MIT), SOSP'03
Paths : Path-Based Failure and Evolution Management
Mike Y. Chen, University of California, Berkeley; Anthony Accardi, Tellme; Emre Kiciman, Stanford University; Dave Patterson, University of California, Berkeley; Armando Fox, Stanford University; Eric Brewer, University of California, Berkeley, NSDI'04
Question: Both of these papers describe techniques for understanding the behavior of large-scale distributed systems. Briefly, how do each of the two techniques determine that certain messages are related? What are the relative strengths and weaknesses of the two approaches and the types of problems one can find?

Centera : Deconstructing Commodity Storage Clusters
Haryadi Gunawi, Nitin Agrawal, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
ISCA'05
Question: This paper shows that one can actively delay packets to determine whether or not a subsequently sent packet is dependent. What are the strengths and weaknesses of this approach for inferring causality? Did this delay technique discover any non-obvious aspects of the Centera write protocol?
MapReduce : MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI'04
Improving MapReduce Performance in Heterogeneous Enviornments
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica, University of California, Berkeley
OSDI'08
Question: One of the assumptions made by the Hadoop Scheduler is that tasks in the same category (map or reduce) require roughly the same amount of work (see Section 2.2 of Heterogeneous paper). How does a MapReduce job typically try to ensure this assumption holds true? The LATE scheduler does not directly address this assumption. How does the LATE scheduler handle tasks with more work? How could you modify the scheduler (or any aspect of the MapReduce framework) to better handle jobs with high variance in work across tasks?
Student Answers
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal,
March 21-23, 2007
Question: Imagine you are reviewing this paper for SOSP. In your review, summarize the paper and then give your arguments for why or why not this paper should be accepted. What are its contributions? What are the weaknesses?
Migration

SpriteMigration -- Transparent Process Migration: Design Alternatives and the Sprite Implementation : meenali
Fred Douglis and John K. Ousterhout
Software - Practice and Experience, Volume 21, Number 8, 1991, Pages 757-785.
V Migration : Preemptable Remote Execution Facility for the V-System
M. Theimer, K. Lantz, and D. Cheriton
10th Symposium on Operating Systems Principles, Orcas Island, WA, December 1985, pp. 2-12.
Question: Migration mechanisms make trade-offs between four factors: transparency, residual dependencies, performance, and complexity. What did the Sprite and V designers choose for each factor? How did their assumptions about their environment and usage scenarios influence each of their decisions?

More Migration

Zap :The Design and Implementation of Zap: A System for Migrating Computing Environments
Steven Osman, Dinesh Subhraveti, Gong Su, and Jason Nieh, Columbia University,
OSDI'04
VMmigration : Live Migration of Virtual Machines
Christopher Clark, Keir Fraser, and Steven Hand, University of Cambridge Computer Laboratory; Jacob Gorm Hansen and Eric Jul, University of Copenhagen; Christian Limpach, Ian Pratt, and Andrew Warfield, University of Cambridge
Symposium on Networked Systems Design and Implementation (NSDI'05), May 2005
Question: Please create three questions that you think would be interesting for everyone to discuss during class. The questions can cover either or both papers. Send your questions by 12:00 (instead of 1:00), please.

Porcupine: Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service
Yasushi Saito, Brian Bershad, and Hank Levy
17th ACM Symposium on Operating Systems Principles, Dec 1999, Kiawah Island Resort
Question: Porcupine (and other distributed system services) characterizes state as being either hard state or soft state. What is the difference between the two? What are the advantages of treating some state as soft? Briefly, how does Porcupine recreate each piece of soft state when needed?
xFS : Serverless Network File Systems
Tom Anderson, Mike Dahlin, Jeanna Neefe, David Patterson, Drew Roselli, Randy Wang.
SOSP 15, December 1995.
Question: How does xFS utilize a log for data and meta-data? What is the purpose of the log? How are the data structures maintained? What are the advantages of writing to a log?
GoogleFS : The Google File System
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
SOSP'03
Where does GoogleFS rely upon soft state and stale information? Discuss the implications and whether or not these appear to be good design decisions.
LOCKSS : Preserving Peer Replicas By Rate-Limited Sampled Voting
Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, Yanto Muliadi
SOSP'03
Question: What is the goal of a malign node in this environment? What is the best strategy a malign node can use? Must malign nodes initiate votes of their own (why or why not)? Must malign nodes participate in the votes of others (why or why not)?
Dynamo : Dynamo: Amazon's Highly Available Key-Value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels
Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
Question: Amazon's key-value storage server, Dynamo, provides services a trade-off between performance, durability, and availability. What are some of the techniques Dynamo uses to improve one of those three metrics? How does it allow services to control the trade-offs?
Pangaea : Taming Aggressive Replication in the Pangaea Wide-Area File System
Yasushi Saito, Christos Karamanolis, Magnus Karlsson, and Mallik Mahalingam, HP Labs, OSDI'02
Question: Why does Pangaea have two classes of replicas: gold and bronze? What is the purpose of each (why not just have gold or just have bronze)? How does Pangaea ensure it has enough replicas?

Computer Sciences | UW Home

Feedback or content questions: send email to "dusseau" at the cs.wisc.edu server
Technical or accessibility issues: lab@cs.wisc.edu
Copyright © 2002, 2003 The Board of Regents of the University of Wisconsin System.