CS 739 - Project List

	UNIVERSITY OF WISCONSIN-MADISON Computer Sciences Department
CS 739 Spring 2007		Barton Miller
	CS 739: Project List

Return to CS739 home page.

General Comments

The projects are intended to give you an opportunity to study a particular area related to distributed systems. These projects may require test implementations, measurement studies, simulations, modeling, literature searches, or some combination of these.

The project suggestions below are briefly stated. They are intended to guide you into particular areas and you should expand these suggestions into a full project description. This gives you more freedom in selecting an area, and more burden in defining your own project. There may be more issues listed for a project than you can cover. If you have a topic of your own that is not listed below, please come and talk with me so we can work out a reasonable project description.

You will typically work in a group of two.

Important Dates

February 13:: Project topics are handed out to you.
February 28:: Project proposals are due, my office 4pm.
March 23:: Work-in-progress session: Each group will make a brief presentation (7 minutes, plus 3 for questions) of their work. These presentations will use Powerpoint (or equivalent) and the time limit will be enforced.
May 2 (at 4pm):: Project paper drafts are due to the referees: The drafts will be very close to the final version of your paper. They may be missing some final results (such as performance numbers or final status of software). These drafts should have been read by another member of the class and the comments used to improve the paper.
May 7:: Project Presentations: The projects will be presented to the class and the public in the form of a poster..
May 7 (9:30am):: Referee reports due back to the paper authors.
May 11 (5pm):: The final project papers are due.

Projects

Fuzz Testing of Threads:
Join a distinguished line of class fuzz projects! In the past (1990, 1995, 2000, and 2006), we have used random input to test the reliability of application programs on UNIX, Windows, and MacOS. These techniques have been quite effective in finding bugs, and have been widely cited; this work has become almost a cult in itself.

This same notion of randomness can be applied to other domains, such as thread scheduling. The goal of this project is to test multi-threaded programs by randomizing and biasing the thread scheduler. Such randomization and bias has the potential to expose synchronization problems in multi-thread programs.

You will have to choose a threading environment, such as pthreads, understand what type of controls an application program has over the scheduling decisions, and how you will manipulate these controls under test. You will also have to choose a set of programs to test and (when you find bugs), identify the causes of the bugs.
Tamper Resistance:
Software designers often want to protect their code from being run without permissions and from detailed study of how it works. This property is called tamper resistance. There are both freeware and commercial products to help developers protect their software.

Unfortunately, the bad guys are also using the same techniques (and often the same software packages) to protect their malicious code from tampering. The goal of this project is to study one or more tamper protection packages to see how they work. In the process, using a tool such as Dyninst, try to develop techniques to bypass these tamper protections.
Scalable Tree-Based Algorithms:
The tree-based overlay network (TBON) is a simple but powerful way of organizing a program in a distributed system. Any algorithm that can be computed by passing data up a tree -- taking inputs from the children, performing a computation on the inputs, and passing them up to the parent -- can be executed on a TBON. A surprisingly large number of problems can be solved in such way, from obvious ones such as sum, min, and max, to more complex ones like graph merge, equivalence class finding, and clock synchronization. The big advantage of TBON computations is that they can run at extreme scale with relative ease.

The Paradyn project has developed the MRNet software as a high-performance TBON computation framework. For more details, see
- "MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools", SC2003, Phoenix, Arizona, November 2003.
- "Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned", 2004 High-Performance Grid Computing Workshop, held in conjunction with the 2004 International Parallel and Distributed Processing Symposium (IPDPS 2004, Santa Fe, New Mexico, April 2004).
- "Tree-based Overlay Networks for Scalable Applications", 11th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2006), Rhodes, Greece, April 2006.
- "Stack Trace Analysis for Large Scale Applications", International Parallel & Distributed Processing Symposium, Long Beach, California, March 2007, to appear.
This software can be used to construct TBON computations with relative ease. The goal of this project is to identify computations that might be described in a tree-like structure, implement them on MRNet, and benchmark the performance on one of our local Linux clusters.
Careless Delegation of Trust:
When you run a Java aplet in your browser, it is performing computation on behalf of the server from which it came. The advantage of putting computation into an aplet is that it reduces the number of times that a web client might need to contact (and wait for) the server. Often, this computation is checking fields in a form to make sure that they are reasonable. If the server does not re-check the values in the form, then it is (improperly) delegating trust to a host that it does not control. Such lack of checking may allow a malicious client to cause the server to process values that will make the server perform unintended tasks. And example of such a situation is where a user is able to embed SQL queries directly into a text field.

The goal of this project is to crawl the web looking for Java aplets, and study the aplet for such checking. When such aplets can be found, then verify whether the checking is reproduced in the server. We would like to identify the server software and bring up a local copy for experimentation (and not experiment on a live production server).
Linux's Pedigree:
In the 1980's, every vendor had their own, proprietary, somewhat incompatible version of the Unix operating system. There were SunOS, Solaris, Ultrix, Unicos, HP-UX, AIX (at least five variants!), Irix, Sinix, and Dynix, just to name a few. This balkanization of the Unix world frustrated software developers and helped to allow the "standard" Windows system to dominate.

In recent years, it appeared that the dominant and quasi-standard version of Unix had become Linux. Such standardization would allow it to compete with Windows on a more equal footing; a software developer knew that if they ported their software to run on Linux, it should run on all distributions of Linux.

Unfortunately, the various Linux distributors (such as Red Hat, Tao, Ubuntu, and SuSE) are now adding their own code, so that they distinguish their distribution. In some cases, it is merely additional services on top of Linux, but in other cases, they are changing the kernel, its interface, and the way that the software is built and linked. These changes cause incompatibilities that might balkanize the Linux world.

The goal of this project is to study the basic Linux kernel, runtime libraries, and basic utilities, to identify the ways in which some of the common distributions vary. These results are interesting both qualitatively (what kinds of things change and in what areas?) and quantitatively (how much of the code changes?).

Brief Project Description

You will prepare a brief (3-4 pages) description of your project. This will be an expanded description of your project. It must be long enough to convince me that you have a reasonable and achievable project (i.e, not trivial and not too large). You may have to revise your description before it will be acceptable.

The project description should describe the problem that you are addressing, how you will go about working on this project (the steps you will take), what results you expect and what you expect to produce, what resources you will need, and a brief schedule for your project.

The Project Report

I will have a handout that will present more information about the form and content of your project report. This handout will be available in a couple of weeks. The reports will no more than 20 pages.

Referee Report

Each of you will referee the paper from another group. I am providing you with a sample referee report form, that is similar to one that I used in a past conference. I am also providing you with two sample referee reports, to give you an idea of what might appear in a real report.

In general, a referee report starts with a summary of the main idea(s) in the paper, and then has an overall statement of the quality. You should then review the main technical ideas. In addition, either a marked-up copy of the paper, with typos and related errors, or a summary of typos and related errors should be given to the authors.