Project Suggestions

At last, what you've been waiting for. The immaculate, incomparable, incredible, and unfortunately, inevitable, list of suggested projects! These are all suggestions that are drawn from the deep recesses of my brain, which we all know is a dark and often frightening place - so be careful!

Your task: Please read carefully, and feel free to come by and ask me questions (email is OK too). Remember, the best project for you is the one you feel highly motivated to work on, and not one that is simply easy to do -- challenge yourself and you might just produce a publishable piece of quality research.

For each project, the basic idea is presented, although of course in the end it is up to you to decide exactly how to proceed. Also listed are some potentially related bits of research, which you should find out more about as soon as you can.

In all cases, please talk to me EARLY and OFTEN about what you are doing! My role is to give you feedback; your role is to incorporate that feedback into what you are doing and hence do it better. This is why I am often referred to as an advisor - I give advice to others (mostly students) for a living. Yes, I am also referred to by other names, but let's not talk about that here.

Finally, remember that you can come up with your own project, too. But we should talk (a lot) about it to make sure it is feasible, etc.

1 : Improving File System Failure Handling

Disks fail. Unfortunately, they don't just fail like they used to, in a simple "fail-stop" manner (working or not). In this project, you will enhance your favorite file system (ext2, ext3, ReiserFS, etc.) to be more robust to the kinds of disk failures that happen more and more today -- data corruption and latent sector errors to name a few. At the simplest level, you could add checksumming to the file system; by incorporating checksums, the file system at least can tell when data has gone bad. More ambitious ideas might be to include parity on the disk, such that a failed block could be recovered. Even more advanced would be the idea
Related work: RAIDs, ask me for more

2 : Disk Faults And Their Impact On File Systems

Related to the project above, instead of simply trying to fix a given file system, let's first understand how the file system handles various types of disk failures. Hence, in this project, you will do just that. We already have done this locally for a few file systems, but have not done so for XFS, a neat file system that runs under Linux. Hence, you will get a leg up because a lot of the testing infrastructure exists; it will be your role to modify it to work under XFS and then to learn what you can about this interesting and advanced file system. Doing this for other file systems is also possible, such as any BSD, MacOS, or other non-Linux file system (except NTFS, which we have already looked at).
Related work: RAIDs, ask me for more

3 : Memory Faults and Their Impact On Operating Systems

If you thought disks fail in funny ways, wait 'til you hear about memory. Turns out that cheap memory fails in interesting ways too, often corrupting data silently. Corruption also occurs for other reasons, e.g., a faulty piece of code running in the OS. In this project, you will create an environment to test OS robustness to data structure corruption -- what happens when a link goes bad in a list? Through this study, you will understand how sensitive the OS is to memory corruption errors. More ambitious projects may even get to the solutions phase -- how do you change Linux to make it more robust to memory failures?
Related work: Nooks, RIO work from Michigan, Failure-oblivious computing, Tandem NonStop

4 : Debunking File System Benchmarks

File system benchmarking is in a state of disarray. In this paper, you will show how silly the benchmarks we run are, including Postmark, Andrew, and TPC-B. Measure what type of stress they put on the file system, and show that it is not as interesting as we would like. Then, if you are ambitious, you could craft an uber-benchmark that could imitate these I/O loads, as well as improve upon them. One thing to make sure you consider: caching and its effects -- how much of a memory load do these benchmarks create?
Related work: Peter Chen's scalable file system benchmark, Fstress from Duke

5 : ExoRAID

RAID systems hide much of their internals from file systems, limiting file system optimizations and control. In this project, you are going to change all of that by exposing the internals of a RAID system to the file system above. How should control over data placement, redundancy, parity calculation, and other internals such as NVRAM be made available to file systems above? How should the file system be changed so as to exploit this information? What you are really thinking about here is a new instruction set for storage -- going beyond simple read and write which has been in place for so long.
Related work: exokernel, infokernel, bridging the gap in storage protocol stacks

6 : Scalability of Modern File Systems

Scalability is of huge importance in modern file systems. How does the file system behave when storing millions or even billions of objects? In this project, you will find out, by measuring and understanding the performance of modern file systems when they store large numbers of objects. Compare the performance of the major Linux file systems: ext3, ReiserFS, etc. Or instead, focus on improving the scalability of one of those file systems, say Linux ext3. Hence, you have some choice: do a thorough measurement based project which compares the scalability of a number of file systems, or do a little measurement to understand one file system and then improve it.
Related work: Peter Chen's scalable file system benchmark, SGI XFS paper

7 : File System Performance on Virtual Machines

Virtual machines are all of the rage. And yet, little is understood about the I/O performance when running on top of a virtual machine monitor. In this project, you will analyze the I/O behavior of Xen, a popular open-source virtual machine monitor. First, understand how I/O works in a virtualized environment; then, evaluate performance when running in a virtualized environment as compared to a standard one; draw conclusions. If you're ambitious, you might even fix some of the problems you find.
Related work: Xen, Chen paper in Usenix '03, OSDI '02 papers, Peter Chen Benchmark

8 : A Hardware-Based Disk Fault Injector

One major thrust of our current research program is the study of disk faults and their impact on file systems. Yet, to do so in a realistic and scientific manner, we (and other researchers) are in need of a platform that can emulate all of the faults a disk can really exhibit. To do this, you will build a disk-fault injector out of a standard PC; simply attach the PC via a SCSI cable and make it "pretend" to be a disk (we will provide the hardware of course). Then, evaluate its performance and ability to emulate a range of disk faults. This interesting environment will enable a huge new range of fault-injection work for disks to be conducted.
Related work:

9 : Consistent Block-Level Snapshots

Snapshotting is the process of creating a consistent image of the file system for archival. Virtually every modern storage system does this in some form. However, it is difficult to do so when the interface to storage is a block-level interface; in that case, it is hard to know when the file system is actually consistent. In this project, you will tackle this problem, likely using "semantically-smart" disk technology. Some ideas to get started: run an I/O intensive workload and watch the resultant stream of I/O to disk -- can you determine how often this stream is consistent? (can you develop a consistency detection algorithm?)
Related work: Semantically-smart disks, Clotho from Toronto

10 : Operating System Support For Asymmetrical Physical Memory

People use standard cheap PCs in all sorts of settings these days. Unfortunately, that means you have to live in a world where not everything is as nice as it should be. One example comes with memory: sometimes the PC you are using does not have the slots you need to add more physical memory. Some vendors have gotten around this by adding memory on a card attached to the PCI (or other I/O) bus. But this introduces an OS management problem -- how should the OS manage memory that is slower than main memory and yet still much faster than disk? Your goal will be to implement some alternatives and see how they compare. What is the easiest way to incorporate this functionality into a modern OS such as Linux?
Related work: Disco, related work pointed to by Disco on page migration, etc.

11 : Gray Middleboxes

Proxies (or middleboxes) are used throughout computing infrastructures. They interpose between clients and servers, and often are used to provide new functionality without modifying either side. In this project, you will develop a "gray" middlebox, that uses "gray box" technology in order to deliver new functionality into a distributed file system environment. For example, a "gray box" NFS caching proxy interposes on NFS traffic, but watches NFS traffic to infer what files clients are caching. The proxy can then make sure to cache a different set of blocks in its cache, hence improving the overall utility of caching.
Related work: Ask me

12 : Accurate and Cost Effective Memory Upgrade

You bring up your browser -- it's sluggish. You run iPhoto, it is slower than a dead dog. Your machine is not running like it should. Why? It doesn't have enough memory! So you should add some. But the question remains: how much? In this project, you will modify the OS to tell you exactly how much memory you should buy, hence improving your ability to upgrade properly and cost-effectively. You could also apply this idea to disks -- if you need to buy a new disk system, how much money should you spend on performance? (but this is likely harder)
Related work: Ghost buffers, ask me for more

13 : Gray-Box Reverse Engineering of Modern Architecture

In this project, you will apply gray-box techniques to reverse-engineer how certain aspects of modern processors work. A simple example would be caching: how can you determine the details of the caching infrastructure? More advanced (but more novel) would be an analysis of hyperthreading: craft some microbenchmarks to determine what code sequences interleave well on modern hyperthreaded processors. A different take on this would be to ask the question: how can one build OS support for thread scheduling on hyperthreaded processors? What should the OS try to learn about the code sequences it interleaves on the processor?
Related work: Gray-box paper, Saavedra-Barrera microbenchmarks, Hyperthreading work from Washington

14 : The I/O Behavior of The iLife Application Suite

Apple has introduced a new operating system, OS X, that is a little bit of everything we saw in our structure discussion. You want legacy apps? OS X supports those. You want Unix? BSD 4.4 is in there. You want a micro-kernel? Mach technology is in the mix as well. Your task is to evaluate the I/O behavior of iLife applications, such as iTunes, iPhoto, iMovie, and even GarageBand, on top of Mac OS X. How do these applications stress the system differently than more traditional applications? The hypothesis hiding behind this work is that this new class of applications stresses file systems in much different ways than typical I/O workloads.
Related work: Baker paper, Roselli paper, Vogels NTFS paper, AppleScript (for automation)

15 : Understanding the Impact of Disk Technology

Disks have been improving in performance for the last so many years at phenomenal rates, yet we are left with the question: how important have these improvements been? How have they affected real application performance? In this project, you could answer that in one of two ways: the first would be via simulation, perhaps running real applications across a range of simulated drives, to understand the impact on application performance. The second would be to develop an emulation framework to "slow down" real hard drives, and thus understand how slowing down I/O slows down real application performance.
Related work: Similar study for networks

16 : Xwhy

So many times, some application I am using (emacs, a browser, etc.) simply hangs for a period of time. In this project, you will build a system that let's the user find out the answer to a simple question: why? The idea behind "xwhy" is that the user should be able to click on a given application and find out something about what the application is doing at that point in time. Is it a DNS lookup? Is the file system the real culprit? Is the system paging? With xwhy, you can find out all the answers. All that for only 19.95! And if you order right now, xwhy comes packaged with a free sweater. This is a large endeavor, so make sure to focus on one interesting aspect of the problem (talk to me for more details).
Related work: Paradyn/Dyninst, debuggers, recent work of debugging complex systems