UNIVERSITY OF WISCONSIN-MADISON
Computer Sciences Department |
||
CS 736
Fall 2019 |
Barton Miller |
The project suggestions below are briefly stated. They are intended to guide you into particular areas and you are expected to expand these suggestions into a full project descriptions. This gives you more freedom in selecting an area and more burden in defining your own project. There may be more issues listed for a project than you can cover. If you have a topic of your own that is not listed below, you should come and talk with me so we can work out a reasonable project description.
You will write a paper that reports on your project. This paper will structured as if you were going to submit it to a conference. I will provide more details on the project report later in the semester.
You should work in teams of two people (and, in certain cases three people) on the project and report.
Macro-benchmarking: In this step, you will try to understand what are the critical performance measures for your system. For example, a database system might use transactions/second, data transfer rates, and transaction latency. You will also look at overall performance measures like CPU utilization and I/O rates.
Micro-benchmarking: Once you have your macro numbers, you then need to explain why your behaves that way. That will require detailed performance measures of parts of your code and the system.
You will need to find tools to help you do this benchmarking.
There are some obvious Linu utilities for helping (such as top
, and you can
also write you own custom tools.
Plus there are tools like Intel's highly regarded Vtune
product and research
tools like HPCToolkit
from Rice University.
For your proposal, you will need to select a system, come up with an initial set of benchmarks to collect, and investigate some tools.
As a product of this project, we would like to provide: (1) an update of the tool set that can be used by other developers, (2) bug reports for the software vendors, and (3) bug fixes for these bugs.
The idea is to apply random testing to thread scheduling. The goal of this project is to test multi-threaded programs by randomizing and biasing the thread scheduler. Such randomization and bias has the potential to expose synchronization problems in multi-thread programs. As multi-core processors are increasing the prevalence of multi-threaded programs, such testing only becomes more interesting.
You will have to choose a threading environment, probably pthreads
,
understand what type of controls an application program has over the
scheduling decisions, and how you will manipulate these controls
under test.
You will also have to choose a set of programs to test and (when you
find bugs), identify the causes of the bugs.
Note that this debugging step can be quite challenging.
condition
.
You will first figure out how to extend the language grammar to support explicit condition variables, including: (1) declaration of this new type of variable, (2) syntax checks to restrict their use to synchronized classes and methods, and (3) explicitly naming the variables to use them.
You will then select an open source Java compiler and figure out how to modify the compiler to allow these declarations and generate the proper byte code. Once you have done that, you will write some sample programs to demonstrate their use.
A working assumption, that you will have to verify, is that the Java Virtual Machine byte code can support these mechanisms without any modification.
Our scalability infrastructure, called MRNet (Multicast Reduction Network), is good for large scale control and monitoring, stream data processing, and parallel applications. This infrastructure has been used to build tools that control more than a million processes or cluster billion of data points.
Benjamin Welton and Barton P. Miller, The Anatomy of Mr. Scan: A Dissection of Performance of an Extreme Scale GPU-Based Clustering Algorithm, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, November 2014.
The goal of this assignment is to build a scalable program or tool based on MRNet. One possible project is to start with the collection of up to a billion (!) Tweets that we can provide to you, and then you will design and implement a program that analyzes these Tweets to extract some common characteristics of the data. You can also develop your own idea (in consultation with Bart) for a scalable program or tool. Whichever idea you choose, you will get the program working and benchmark its performance to understand its limits on scalability.
Your project proposal will describe your goals, methods, implementation outline, evaluation criteria, and resources needed. You will need to describe the basic problem that you will be addressing.
Provide a more detailed description of how you will approach the problem. This description should be contain much more detail than the brief paragraphs given above. Specifically, you will need to describe:
Project proposals will typically be three to four pages, formatted the same as you did for the first paper.
It it crucial that you discuss your approach with me before you write your proposal. This will allow you make a quick start.
It is also crucial that you keep to your plan (or even ahead of it); if you try to do all of your project in the last week or two, you will crash and burn.
In general, a referee report starts with a summary of the main idea(s) in the paper, and then has an overall statement of the quality. You should then review the main technical ideas. In addition, either a marked-up copy of the paper, with typos and related errors, or a summary of typos and related errors should be given to the authors.