CS739 Spring 2009: Questions
- Survey -- Distributed
Operating Systems :
Andrew S. Tanenbaum and Robbert Van Renesse
ACM Computing Surveys, Volume 17, Issue 4 (December 1985)
Question: This paper surveys distributed systems as of
1985. What were the goals of these distributed systems? What were the assumptions (in terms of workload and environment) of
these systems? Which
design issue (i.e., communication, naming and protection, resource
management, fault tolerance, and services) seems most challenging (or
- Sprite vs. Amoeba :
A Comparison of Two Distributed Systems: Amoeba and Sprite
Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, Andrew S. Tanenbaum.
Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.
Question: Do you think system projects should pay attention to
technology trennds? Why or why not? Do you think Amoeba or Sprite did
a better job predicting future technology? Explain.
Question: Discuss one of the changes that was made to NFSv3
and one of the changes made to NFSv4. What problem did each change
address? Does the change introduce any drawbacks or challenges?
- Coda : Disconnected Operation in the Coda File System
James J. Kistler, M. Satyanarayanan
13th Symposium on Operating Systems Principles, Asilomar,
California, pp. 213-225. October 1991.
Question: Coda is derived from AFS. What aspects of AFS
simplify the design of Coda? Imagine you were instead building Coda
on top of NFS; what aspects of Coda would be easier or harder (e.g.,
within Hoarding, Emulation, and Reintegration)? Make
sure you specify which version of NFS you are assuming!
- LBFS : A Low-Bandwidth Network
Athicha Muthitacharoen, Benjie Chen (MIT), David Mazieres
Question: Create your own...
- Speculator : Speculative execution in a distributed file system
Edmund B. Nightingale, Peter M. Chen, Jason Flinn
Proceedings of the twentieth ACM symposium on Operating systems
principles (SOSP'05), pages 191 - 205.
Imagine you are a system administrator who needs to decide
whether to deploy SpecNFS (the speculative version of NFSv3) or NFSv4.
What are the pros and cons of each version? Which would you choose and
why? (You can assume a reputable implementation of each version exists.)
- Analysis1 :
- Black-Box : Performance Debugging for
Distributed Systems of Black Boxes
Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener,
Patrick Reynolds, Athicha Muthitacharoen
(HP Labs, Duke, and MIT), SOSP'03
- Paths : Path-Based Failure and Evolution Management
Mike Y. Chen, University of California, Berkeley; Anthony
Accardi, Tellme; Emre Kiciman, Stanford University; Dave
Patterson, University of California, Berkeley; Armando Fox,
Stanford University; Eric Brewer, University of California,
- Question: Both of these papers describe techniques for
understanding the behavior of large-scale distributed systems.
Briefly, how do each of the two techniques determine that certain
messages are related? What are the relative strengths and weaknesses
of the two approaches and the types of problems one can find?
- Centera : Deconstructing Commodity Storage Clusters
Haryadi Gunawi, Nitin Agrawal, Andrea Arpaci-Dusseau, Remzi
Question: This paper shows that one can actively delay packets
to determine whether or not a subsequently sent packet is dependent. What
are the strengths and weaknesses of this approach for inferring
causality? Did this delay technique discover any non-obvious aspects
of the Centera write protocol?
- MapReduce : MapReduce: Simplified
Data Processing on Large Clusters
Jeffrey Dean and Sanjay
Improving MapReduce Performance in Heterogeneous Enviornments
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and
Ion Stoica, University of California, Berkeley
Question: One of the assumptions made by the Hadoop Scheduler
is that tasks in the same category (map or reduce) require roughly the
same amount of work (see Section 2.2 of Heterogeneous paper). How
does a MapReduce job typically try to ensure this assumption holds
true? The LATE scheduler does not directly address this assumption.
How does the LATE scheduler handle tasks with more work? How could
you modify the scheduler (or any aspect of the MapReduce framework) to
better handle jobs with high variance in work across tasks?
- Dryad: Distributed
Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis
European Conference on Computer Systems (EuroSys), Lisbon, Portugal,
March 21-23, 2007
Question: Imagine you are reviewing this paper for SOSP. In
your review, summarize the paper and then give your arguments for why
or why not this paper should be accepted. What are its contributions?
What are the weaknesses?
- SpriteMigration -- Transparent
Process Migration: Design Alternatives and the Sprite
Implementation : meenali
Fred Douglis and John K. Ousterhout
Software - Practice and Experience, Volume 21, Number 8, 1991,
- V Migration : Preemptable Remote
Execution Facility for the V-System
M. Theimer, K. Lantz, and D. Cheriton
10th Symposium on Operating
Systems Principles, Orcas Island, WA, December 1985, pp. 2-12.
- Question: Migration mechanisms make trade-offs
between four factors: transparency, residual dependencies,
performance, and complexity. What did the Sprite and V designers choose for
each factor? How did their assumptions about their environment and usage
scenarios influence each of their decisions?
- More Migration
- Zap :The Design and Implementation of Zap: A System for Migrating Computing Environments
Steven Osman, Dinesh Subhraveti, Gong Su, and Jason Nieh, Columbia
VMmigration : Live
Migration of Virtual Machines
Christopher Clark, Keir Fraser, and Steven Hand, University of
Cambridge Computer Laboratory; Jacob Gorm Hansen and Eric Jul,
University of Copenhagen; Christian Limpach, Ian Pratt, and Andrew
Warfield, University of Cambridge
Symposium on Networked Systems Design and Implementation
(NSDI'05), May 2005
- Question: Please create three questions that you think would
be interesting for everyone to discuss during class. The questions
can cover either or both papers. Send your questions by 12:00
(instead of 1:00), please.
- Porcupine: Manageability, Availability and Performance in
Porcupine: A Highly Scalable Internet Mail Service
Yasushi Saito, Brian Bershad, and Hank Levy
17th ACM Symposium on
Operating Systems Principles, Dec 1999, Kiawah Island Resort
Question: Porcupine (and other distributed system services)
characterizes state as being either hard state or soft state. What is
the difference between the two? What are the advantages of treating
some state as soft? Briefly, how does Porcupine recreate each piece of soft
state when needed?
- xFS : Serverless
Network File Systems
Tom Anderson, Mike Dahlin, Jeanna Neefe, David Patterson, Drew Roselli, Randy Wang.
SOSP 15, December 1995.
Question: How does xFS utilize a log for data and meta-data?
What is the purpose of the log? How are the data structures maintained? What are the advantages of writing to a log?
- GoogleFS : The Google File
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Where does GoogleFS rely upon soft state and stale
information? Discuss the implications and whether or not these appear
to be good design decisions.
- LOCKSS : Preserving Peer Replicas By Rate-Limited Sampled Voting
Petros Maniatis, Mema Roussopoulos, TJ Giuli, David
S. H. Rosenthal, Mary Baker, Yanto Muliadi
Question: What is the goal of a malign node in this environment?
What is the best strategy a malign node can use? Must malign nodes
initiate votes of their own (why or why not)? Must malign nodes participate in the
votes of others (why or why not)?
- Dynamo : Dynamo: Amazon's Highly Available Key-Value
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami
Sivasubramanian, Peter Vosshall and Werner Vogels
Proceedings of the 21st ACM Symposium on Operating Systems
Principles, Stevenson, WA, October 2007.
Question: Amazon's key-value storage server, Dynamo, provides services a
trade-off between performance, durability, and availability. What are some
of the techniques Dynamo uses to improve one of those three metrics?
How does it allow services to control the trade-offs?
- Pangaea : Taming Aggressive Replication in the Pangaea Wide-Area File System
Yasushi Saito, Christos Karamanolis, Magnus Karlsson, and Mallik
Mahalingam, HP Labs, OSDI'02
Question: Why does Pangaea have two classes of replicas: gold
and bronze? What is the purpose of each (why not just have gold or
just have bronze)? How does Pangaea ensure it has enough replicas?
Feedback or content questions:
send email to "dusseau" at the cs.wisc.edu server
Technical or accessibility issues:
Copyright © 2002, 2003 The Board of Regents of the University of Wisconsin System.