|
My UW
|
UW Search
Computer Science Home Page
> ~dusseau
A. Arpaci-Dusseau Home
CS739 Home
Reading List
Schedule
Questions
C.S. Dept. Home Page
|
 |

|
|
CS739 Spring 2005: Projects
A major research area is developing complex systems
are easier to manage. We need innovations in a number of different
areas for this to happen.
Deriving Causal Paths
For either a human (or for the system itself) to "understand"
behavior, its useful to know the relationship between different
requests or messages in the system. We've read a number of different
papers that describe techniques for deriving the causal paths in a
distributed system. In this project, you will develop extensions to
those techniques. There are a number of options here.
- Apply the technique of delaying packets (used previously to
analyze the Centera) to the distributed system
of your choice.
- Adapt the technique of delaying packets so that it will work
on-line. If there are multiple concurrent requests, will delayed
messages still help you infer relationships? How can you make the
overhead acceptable in an on-line system? For example, can you
find an acceptably short delay period or delay only a small sample of
messages?
- What additional information is available to help infer causality?
Does examining the content of messages help one determine that
a message is dependent on a previous one? For example, if one sees
that a subset of the data is copied from one message to the next, one
can infer that they are likely to be dependent. If this information
is useful, how can this searching be done efficiently?
Monitoring and Changing Behavior
Proxies are useful in a distributed system for both monitoring and
changing the behavior of the system. Proxies are often used for
filtering or load-balancing; what else can they be used for?
- To determine whether or not a primary system is behaving correctly, a
proxy could replicate requests to a backup/secondary system as well.
The proxy can then compare the outputs of the two to determine if
there is a problem. However, when performing this comparison, one
must be able to filter out unimportant differences, such as timing.
In this project, you will build a comparator for the distributed
system of your choice.
- A proxy can be used to convert one protocol to another. For
example, NFS is the lingua franca of distributed file systems --
virtually every known operating system can mount NFS volumes. However,
in certain departments and work environments, other file systems are
used (e.g., here we use AFS). In this project, you seek to provide
access to these other file systems but do so without requiring new
client-side software. Instead, you will build an NFS-to-AFS "bridge"
that transforms NFS requests into their meaningful AFS
counterparts. Thus, you will have a machine that sits there and takes
NFS requests and passes them onto AFS servers. Many issues arise:
performance, consistency, and security in particular come to mind.
Alternatively, one could convert NFS to HTTP, or vice versa.
Understanding Failures
One of the keys to building a distributed system is having it operate
correctly when a node fails. However, less attention is generally
paid to handling subsystem failures: for example, when a single
process dies, or when a disk block or memory chip either fails or
returns incorrect data. For example, what happens to Linux (or
perhaps an NFS server running on Linux) when you corrupt a data
structure within it? In this project, you will characterize how the
system of your choice handles a range of these more interesting
failures.
Improving Fault-Tolerance
Microreboots and Failure-oblivious computing are two new techniques that
have been proposed for improving the reliability of servers. How can
these ideas be extended?
- Can you apply either of these techniques to a modern OS?
For example, can you build a rebootable file system within Linux? Or
could you alter Linux so as to use the failure-oblivious computing
infrastructure?
- Is it possible to implement a hybrid of these two techniques? For example,
while a component is being rebooted (or microrebooted) can they system
return manufactured values?
- Where else is failure-obliviousness useful? For example, the
authors apply this idea to memory; is there an analog for disks?
(e.g., when the disk fails or the user tries to read past the end of a
file, can you just "manufacture" a result and continue computing?)
The peer-to-peer community seems to still be searching for its killer
application. Can you implement one? You may want to build your
application on top of one of the existing DHT implementations (e.g.,
Bamboo or OpenDHT). Some possibilities for applications include:
- Personal Communication References Tired of reading papers
that refer to personal communication? This service would allow
researchers to enter an extended quote from someone that they wish to
cite; others would then be able to verify the exact original quote.
- Mail System Spam is a big problem. We can pretend to fix it
by trying fancy new filters or other bandaids, but the real
problem lies in the basic construction of the system. So let's
junk the old system and make a new one! In this project, you will
do just that. Build a P2P system avoids spam as a first
principle. One thing you could do is to include a strong concept
of identity in it -- in other words, to join this mail system,
someone has to let you join (say, a friend). Then, if you start
sending mails people don't like, they can trace it to you (and
also, to your friend), and kick one or both of you off of the
system. There are other ways to approach the problem too, for
example, by requiring computation on the part of the sender to be
able to send something to a receiver. In any case, there is a huge
space of problems that could be attacked here. You may want to
start with the epost code base.
- Video Indexing There is so much video out there on the web,
and it is growing. In this project, you will build a p2p overlay that indexes
all the video out there based on its audio stream. A couple of approaches are
possible: use speech-to-text technology, or grab the close-captioned text and
use that.
- Overcite A repository for academic papers, similar to
citeseer. This has been suggested by other researches, but I don't
believe it is available yet.
- Fundamental Algorithms There are a wealth of distributed
algorithms in the literature, but their behavior under realistic
assumptions is often not well understood. In this project, you will
implement any traditional distributed algorithm (e.g., Paxos) and
evaluate how it behaves in the "real world". How scalable is it? How
does it perform under failure? How does it react to network delays?
|
|
 |
| |
 |
 |
 |
|
Computer Sciences
|
UW Home
|
|
Feedback or content questions:
send email to "dusseau" at the cs.wisc.edu server
Technical or accessibility issues:
lab@cs.wisc.edu
Copyright © 2002, 2003 The Board of Regents of the University of Wisconsin System.
|
|