Project Suggestions

Welcome to the Fall 2007 CS 736 List of Projects! What you see now is what you will be spending countless hours upon in the coming weeks -- sorry about that! But this is as it must be. We only really learn by doing (alas), so now it is time to do.

Your task: Please read carefully, and feel free to come by and ask me questions (email is OK too). Remember, the best project for you is the one you feel highly motivated to work on, and not one that is simply easy to do -- challenge yourself and you might just produce a publishable piece of quality research.

For each project, the basic idea is presented, although of course in the end it is up to you to decide exactly how to proceed. Also listed are some potentially related bits of research, which you should find out more about as soon as you can.

In all cases, please talk to me EARLY and OFTEN about what you are doing! My role is to give you feedback; your role is to incorporate that feedback into what you are doing and hence do it better. This is why I am often referred to as an advisor - I give advice to others (mostly students) for a living. Learning how to receive advice and critique are central to becoming a good graduate student -- and thus you should view this as an explicit part of your training.

Finally, remember that you can come up with your own project, too. But we should talk (a lot) about it to make sure it is feasible, etc.

1 : Log Unrolling

Most modern file systems use a journal to enable consistent update of
on-disk structures. These journals are used to commit intents to disk; after
the file system has been updated, the intents (in the journal) are cleared,
and thus the journal space can be reused. In this project, you are going to
build a system that "unrolls" the journal; that is, keeps the journal around
for a long time instead of overwriting it repeatedly. Why? Well, there are a
number of applications that could make use of the journal contents if they
were around and had an API to access them. For example, imagine a backup
system that wanted to know which files had been updated recently; if it could
just scan the journal, it would know exactly what needed to be backed
up. Similar arguments can be made for a content index or a security
application.
Related: S4 from CMU (OSDI '00)

2 : Range Writes

Writing to disk in a rotationally-optimal manner is challenging. Much of the
problem arises from the current interface, which demands an exact address for
each write. In this project, you will change this interface by building what
we call "range writes". A range write takes a data block and a list of
possible destination addresses; the disk then internally chooses the best
possible address and writes the data to it, returning the address to the
caller when finished. You will explore range writes via detailed simulation
and show how it can be incorporated into existing file systems. You will
improve performance. You will be happy.
Related work: Wang's work on trade-offs in building disk arrays (OSDI '00),
Popovici's Disk Mimic (USENIX '02)

3 : Beyond the Journal: Write Anywhere Regions

Synchronous workloads do poorly on our modern journaling file systems because
they cause the addition of many disk rotations. Part of this problem is caused
by the nature of the file system journal, which demands each transaction be
written exactly after the last. In this project, you change the write-ahead
log into a write-ahead region, which is more flexible; specifically, a
transaction can be written anywhere within the log instead of only in
sequential order. Using this flexibility, you will develop a system that
writes transactions to the location nearest the disk head, thus avoiding the
high cost of rotations. The additional good news: we have some of this system
built; coming in to work on it now will likely lead to a paper submission
sooner rather than later!
Related work: Popovici's Disk Mimic (USENIX '02), and ask me

4 : Performance Analysis of WAFL or ZFS

There are a number of important commercial file systems that we know
little about. One is called WAFL and it is from Network Appliance; another
is called ZFS and it is from Sun. In this project, you will use "gray box"
techniques to uncover the inner workings of this complex storage system. What
are the important policies of WAFL? What techniques can you come up with to
discover them? As systems grow more complex, we need to take a more scientific
approach to analyzing them (ala nature); this project is one small but
important step in that direction.
Related: gray-box paper (SOSP '01), Burnett cache paper (USENIX '02),
Gunawi (ISCA '05), Prabhakaran SBA paper (USENIX '05)

5 : IRON WAFL or IRON ZFS

We know that Linux file systems have problems when it comes to faults. How
about the commercial file systems from Sun and Network Appliance? In this
project, you will learn how to find flaws and bugs in fault handling in these
complex and interesting systems. Do these commercial systems exhibit the same
types of problems that Linux file systems do? Many people are doubtful. You
are here to prove them wrong and show the world that all file systems are
terribly, painfully broken.
Related: IRON paper

6 : IRON Databases

You saw in the IRON paper how to inject faults into file systems. In this
project, you will explore your inner database-loving self by injecting a
similar set of faults underneath a modern database management system. How
robust are databases to latent sector errors, corruption, and other common
disk problems? Do this project, find the answer, and then tell those database
types that they too don't know how to build reliable systems.
Related: IRON paper

7 : Online Fsck for Modern File Systems

File systems have bugs. Sometimes, these bugs result in incorrect data being
written out to disk, potentially corrupting on-disk structures (despite the
presence of journaling). In this project, you will build a lightweight online
consistency checker that runs underneath a mounted file system. The checker
should guarantee the consistency/correctness of a group of writes when they
are written to disk (or alternately, when they are read). Things to think
about: what kind of checks are easy to do, which are hard? Which are
expensive, which are not? How should the file system communicate with the
checker? (what are the interfaces?) And so forth.
Related: Ask me

8 : Understanding Overloaded in Virtual Memory Systems

Modern virtual memory systems are large, hairy beasts, and though many of the
concepts are "well understood", little is known as to how modern systems
actually handle overload. In this project, you will examine the state of the
art in virtual memory systems, using "gray box" techniques to uncover how
these systems behave when free memory is low. The opportunity here is to
reinvigorate the OS community to look at virtual memory again, just like
lottery scheduling got us all to think about scheduling again.
Related: gray box paper (SOSP '01), Burnett's paper (USENIX '02)

9 : File System Benchmark Suite

The state of file system benchmarks is terrible. In this project, you will fix
that by finding some interesting applications and turning them into
configurable workload generators than anyone can use. Some places to start:
desktop search (indexing, lookup); photo management software; video editing;
music library management. Any progress here will be tremendous, and a real
step forward in the science of studying file systems.
Related: SynRGen from Sigmetrics some years back (author: Satya)

10 : Reliability and Availability of Web Storage Services

We are putting more and more data on the web, in email services such as Gmail
and storage utilities such as Amazon's S3. In this project, you will figure
out if this is a good idea by measuring the reliability and availability of
these services. Put a lot of data in there; can you get it back? How often can
you access the data? What kind of performance can you expect? You might want
to look into using an overlay such as PlanetLab to evaluate some of these
questions.
Related: PlanetLab, Baker's archival work, recent HP position paper

11 : Reboot: Where Does The Time Go?

The key to availability is fast recovery. In this project, you'll study how
long it takes to reboot a system. Where does the time go? Break this down and
then figure out some ways to optimize reboot time and thus increase
availability the easy way.
Related: Brewer/Fox paper on Giant-scale systems, whole system simulators
like Simics

12 : Static Analysis of File Systems

File systems have lots of bugs. In this project, you will find some of
them, mostly by applying techniques that the programming languages community
has been developing. Our group's early work on this looks at analyzing how
errors are propagated in C code in Linux-based file systems. Your work will
continue this but take it a step further by running it on commercial systems
like ZFS or extending the analyses to work in different settings.
Related: Our group has a paper (under submission), and ask me

13 : Automatic RAID Construction

RAID systems have lots of interesting ways to be built, including different
kinds of checksums and parity-based schemes. In this work, you will start by
coming up with a way to describe these schemes and then build a RAID generator
that spits out code to build a software RAID from the description. Thus, the
implementation should be much more robust than one that is coded by hand.
Related: REO paper from FAST and one (under submission) from our own group

14 : Exploring Scalability

File system scalability is something we all think is important. But what does
it mean? In this project, you will explore the limits of scalability in modern
file systems. How can we define benchmarks that unearth interesting behaviors
in these systems? How can we build file systems that are scalable? How can we
build a read-only version of a file system that does not have a large memory
footprint? etc.
Related: SGI XFS

15 : Footprints in the Cache

Thread schedulers or event-based schedulers have very little information on
which to base their decisions. In this project, you will help get some
information on which thread/event to schedule by building footprints, a
gray-box cache footprint detector. The basic idea is simple: before running a
piece of code, first preload the cache with some data. Then run the
event/thread. Then, access the data you had preloaded into the cache. If it is
still there, that means the event/thread didn't use that cache block; if it
isn't there, the event/thread did. Using this basic technique, one can build
up a profile of cache utilization of different events/threads and thus make
better scheduling decisions.