UNIVERSITY OF WISCONSIN-MADISON Computer Sciences Department
CS 736 Fall 2016		Barton Miller

Project List

(Project Assigned: Monday, October 10)
(Project Proposal Due: Friday, October 21, in class)
(Midway Interviews: Thursday and Friday, November 10 and 11)
(Draft Report to Referees: Friday, December 9, noon)
(Referee Reports to Authors: Wednesday, December 14 noon)
(Final Reports Due: Monday, December 19, noon)
(Project Presentation Session: Wednesday, December 21, 10am-noon)

General Comments

The projects are intended to give you an opportunity to study a particular area related to operating systems. Your project will require a test implementation, measurement study or analysis, literature search, and evaluation.

The project suggestions below are briefly stated. They are intended to guide you into particular areas and you are expected to expand these suggestions into a full project descriptions. This gives you more freedom in selecting an area and more burden in defining your own project. There may be more issues listed for a project than you can cover. If you have a topic of your own that is not listed below, you should come and talk with me so we can work out a reasonable project description.

You will write a paper that reports on your project. This paper will structured as if you were going to submit it to a conference. I will provide more details on the project report later in the semester.

You should work in teams of two people (and, in certain cases three people) on the project and report.

Projects

Here are a few project suggestions. I have mentioned other ideas in class and I encourage you to develop an idea of your own.

(1) Benchmarking Revisited

For your first assignment, you did a series of measurements. In many of these measurements, your results had a great deal of variation that you could speculate on but not precisely identify. The goal of this assignment is to get to the source of the variation by measuring activities inside the Linux kernel.

You have a variety of approaches, from putting in your own print statements (printk), the new perf_events software and hardware counter interface, function tracer (ftrace), and the Red Hat SystemTap tool.

Precise identification of what happens in a given run of a benchmark program is the key to this project. Note that you will likely have to run this project on your own computer (though I will see if we can get access to a system where you can have root access).

(2) More Random Testing of Programs:

Join a distinguished line of CS736 projects! In the past (1990, 1995, 2000, and 2006), we have used random input to test the reliability of application programs on UNIX and NT. These techniques have been quite effective in finding bugs, and have been widely cited. Fuzz testing has become a fundamental technique in both the testing and security communities, and it all started with CS736 projects. Here are a couple of different fuzz projects that you might undertake (you are free to think of others):

A. Comparing the Free Unix Variants

The goal of this project is to take the previous study of applications on various platforms and repeat the study on command-line and X-window applications across the various free/open UNIX systems such as Linux, BSDi, freeBSD, and even the simple V6-based UNIX version used for operating system class projects, Xv6.

As a product of this project, we would like to provide: (1) an update of the tool set that can be used by other developers, (2) bug reports for the software vendors, and (3) bug fixes for these bugs.

B. Extend the Testing of Unchecked Kernel Call Return Values

As you heard me mention, unchecked returned values are a significant contributor to problems with software reliability and security. In the Fuzz Revisited paper, we included tests for this problem in programs that called malloc and its cousins. For this project, the goal would be to focus exlusively on the problem of unchecked return values, and extend the coverage to a wide variety of kernel calls and library routines.

(3) Dynamic Instrumentation API:

The Paradyn project has developed a library for the analysis and patching of binary (executable) programs and libraries, for both the executable files and running programs. This library, called the Dyninst API, allows programmers to write tools that can patch code into unmodified programs in a machine-independent way. So, a tool that uses Dyninst can work on x86 and Power (32 and 63 bit). Many simple tools have been built using Dyninst for tracing and debugging, and more complex tools for doing such things as checkpointing and process migration.

With Dyninst, you can extract structural information such as control flow and call graphs, and identify functions, basic blocks, and instructions. You can instrument code and patch it to change its behavior, either statically (before execution) or dynamically (during execution).

The goal is to choose some analysis, instrumentation, or control task, and build a Dyninst-based tool. The task could be in the area of tracing, program analysis, modeling, cyber-security and forensics, performance profiling, or debugging.

An interesting example of such a project is to build a tool that can perform "hot patching" of a running server. You can check out the example of such a tool in the paper:

Andrew R. Bernat and Barton P. Miller, "Structured Binary Editing with a CFG Transformation Algebra", 19th Working Conference on Reverse Engineering (WCRE), Kingston, Ontario, October 2012.

(4) Extreme Scale: The Multicast-Reduce Network (MRNet):

Infrastructures for scalable computing are intended to ease the task of distributed programming, reducing errors and making it easier to bulk process large amounts of data easily.

We have a local developed scalability infrastructure, called MRNet (Multicast Reduction Network), which is good for large scale control and monitoring, and stream data processing. This infrastructure has been used to build tools that control more than a million processes or cluster 10 billion data points.

Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools, Supercomputing 2003, Phoenix, Arizona, November 2003.
MRNet API Programmer's Guide", Version 4.1, April 2012.

The goal of this assignment is to build a scalable program or tool based on MRNet. One possible project is to start with the collection of up to a billion (!) Tweets that we can provide to you, and then you will design and implement a program that analyzes these Tweets to extract some common characteristics of the data. You can also develop your own idea (in consultation with Bart) for a scalable program or tool. Whichever idea you choose, you will get the program working and benchmark its performance to understand its limits on scalability.

(5) Windows vs. Unix File System Performance

NTFS and the Unix FFS are both designed to provide good performance. However each file system has been optimized to be good at different things. The goal of this project is to do a side-by-side comparison of the basic on-disk structures, performance strategies (e.g., memory caching of data and meta-data, pre-fetching, replication), and generate workloads to tests these features.

Project Proposal

The project descriptions listed above a intentionally brief. You will need to expand and refine these to develop your own project. Different groups may choose the same topic, but have significantly different emphases.

Your project proposal will describe your goals, methods, implementation outline, evaluation criteria, and resources needed. First, you should describe the basic problem that you will be addressing. Next, you should provide a more detailed description of how you will approach the problem. This description should be contain much more detail than the brief paragraphs given above. You will provide an outline of features that you project will include, an implementation plan, and an evaluation plan.

Project proposals will typically be 3 to 4 double-spaced pages.

Referee Report

Each of you will referee the paper from another group. I am providing you with a sample referee report form, that is similar to one that I used for a recent conference. I am also providing you with two sample referee reports, to give you an idea of what might appear in a real report.

In general, a referee report starts with a summary of the main idea(s) in the paper, and then has an overall statement of the quality. You should then review the main technical ideas. In addition, either a marked-up copy of the paper, with typos and related errors, or a summary of typos and related errors should be given to the authors.