|
UNIVERSITY OF WISCONSIN-MADISON
Computer Sciences Department |
|
CS 736
Fall 2004
|
|
Barton Miller |
Paper Assignment 1: Benchmarking Interprocess Communications
(Assigned: Wednesday, September 8)
(Due: Wednesday, September 22, 4pm)
Description
The goal of this assignment is to get some experience doing benchmarking
by measuring the performance of various operating system and
interprocess communication mechanisms.
You will design various experiments, build simple tools, and carry out
a methodical experiment, summarize the results, and draw conclusions.
Be careful!
Benchmarking is a subtle and tricky business; things that look simple on
first glance will often turn out to be quite intricate.
The Communication Mechanisms
- Unix Pipe:
The most basic of IPC mechanisms on UNIX is the pipe; it has been around
since the earliest versions of UNIX.
The
pipe
system call is executed by a process to create both ends of a uni-directional
communication channel.
This channel is a stream of bytes that insures ordering and correct
delivery.
The pipe, when combined with a
fork operation, allows two processes to pass messages.
The
socketpair operation will create a bi-directional channel
that is equivalent to two pipes.
- Internet (INET) Stream Sockets:
The stream socket, based on TCP/IP, is the backbone of the Internet.
These sockets can be used for remote (inter-host) and local communication,
providing much the same abstraction as a pipe: a reliable, ordered byte
stream.
Almost every operating system supports communication over these sockets.
- Internet (INET) Datagram Sockets:
The datagram socket is based on the UDP protocol. As with the stream socket,
it can be used for remote (inter-host) and local communication.
It provides a message abstraction instead of a stream. In addition, it
does not provide reliability or message ordering guarantees.
However it does provide checksums so that if a message is delivered, it's
contents are intact.
As a simpler protocol, should have less overhead.
NB: Datagram sockets can (and will) experience packet loss. Your experimental
set-up must be designed to tolerate such loss.
The Measurements
Choose a version of the Unix operating system.
This system can be any version that you have available to you (Solaris, Linux,
AIX, Windows, MacOS X, or whatever).
On this platform, you will measure the follow features:
-
Clock precision:
The operating system and hardware provide various ways to measure time.
Identify two ways of measuring elapsed time and determine the resolution
(precision) of the clock.
One way to do this is to read the clock value at the start and end of
a simple loop.
Start with a single loop iteration, then
increase the iteration count of the loop until the difference between
the before and after samples is greater than zero.
Try to get the smallest non-zero positive difference.
If a single iteration of a loop takes too much time, try
putting simple statements between the two timer calls.
Repeat this test for each of the two way that you measure time.
Use the more precise way in the rest of your experiments.
-
Trivial kernel call:
Choose a simple kernel call such getpid
to measure and compare the elapsed time to perform the calls.
Choose one or two other kernel calls that you suspect perform
trivial operations, and
measure the time to perform these.
-
Inter-Process Communication Time:
For each of the communication mechanisms listed above, you will
measure the following characteristics:
- Message latency: Latency is the time for some activity
to complete, from beginning to end. For message passing, it is
the time from the start of a send to the completion of a
receive. Since the clocks on two different hosts may not be
sufficiently aligned, the easiest way to measure message latency
is to measure the time it takes to complete a round-trip
communication (and divide by two).
NB:
Beware of nagling
on the internet stream experiments. This mechanism can cause unexpected
delays. You can disable nagling with the
TCP_NODELAY socket option.
Measure latency for a variety of message sizes from 4 bytes up to
64K.
Watch out for message size limits on UDP.
- Throughput: Throughput is the amount data that is sent
per unit time. In this case, a round trip measure is not necessary;
you can sent a return message when the entire transfer amount has
been sent.
Send a large enough total quantity of data such that the single
"ack" response contributes a small amount of time compared to the
whole transfer.
Measure throughput for a variety of message sizes from 4 bytes up to
512K.
Watch out for message size limits on UDP.
The Experimental Method
Computer Scientists are notably sloppy experimentalists.
While we do a lot of experimental work, we typically do not follow good
experimental practice.
The experimental method is a well-established regimen, used in all areas of
science.
The use of the experimental method keeps us honest and gives form to the
work that we do.
The basic parts of an experiment are:
- Identify your variables:
Variables are things that you can observe and quantify.
You need to identify which variables might be related and whether a variable
is a cause (i.e., the message size of a send operation) or the effect (e.g.,
the time to complete the send).
Even though this sounds obvious, you should consciously identify the variables
in each experiment that you perform.
- Hypothesis:
The hypothesis is a guess (we hope, an educated guess) about the outcome of
the experiment.
The hypothesis needs to be worded in a way that can be tested in an experiment,
so it should be stated in terms of the experimental variables.
- Experimental apparatus:
You need to obtain the necessary equipment for your experiment.
In this case, it will be the needed computer and software.
- Performance of experiment and record the results:
This part is the one that we typically think of as the real work.
Note that several important steps come before it.
- Summarize the results:
Summarization means putting the data in a form that you can understand.
You might put the data in tables, graphs, or use statistical techniques
to understand the raw data.
If you are using averages, make sure to read Jim Smith's paper in the October
1988 issue of CACM (there are many types of means, and you need to use
the right one)!
- Draw conclusions:
Note that performing the experiment and summarizing the results are
separate steps and both come before you draw conclusions.
To present honest and understandable results, we must present the basic
data first (so that the reader can draw their own conclusions) before we
insert our bias.
The experimental method has more subtleties than this (such as trying
to account for experimenter and subject biases), but the above description
is sufficient for basic computer measurement experiments.
Learning about Sockets
If you need help with using the various ocket calls, here are some resources
suggested by my members of my research group:
Constraints
The paper should be
at most 6 pages
(all inclusive), 10 point font,
18 point spacing,
single-sided and 1 inch margins.
The paper must contain the following parts:
- Title:
-
The title should be descriptive and fit in one line across the page.
Interesting titles are acceptable, but avoid overly cute ones.
- Abstract:
-
This is the paper in brief;
it is not
a description of what is in the paper.
It should state the basic ideas, techniques, results, and conclusions of the paper.
The abstract is not the introduction, but a summary of everything.
It is an advertisement that will draw the reader to your paper, without
being misleading.
It should be complete enough to understand what will be covered in the paper.
Avoid phrases such as "The paper describes...."
This is a technical paper and not a mystery novel; do not be afraid of giving
away the ending.
- Body:
-
This is the main part of the paper.
It should include an introduction that prepares the reader for the
remainder of the paper.
Assume that the reader is knowledgeable about operating systems.
The introduction should motivate the rest of the discussion and outline
the approach.
The main part of the paper should be split into reasonable sections that
follow the basics of the experimental method.
This is a discussion of what the reader should have learned from the
paper.
You can repeat things stated earlier in the paper, but only to the extent
that they contribute to the final discussion.
- References:
-
You must cite each paper that you have referenced.
This section appears at the end of the paper.
- Figures:
-
A paper without figures, graphs, or diagrams is boring.
This paper will certainly need several performance tables and graphs.
Your paper must have figures.
Do not re-describe the assignment; address the issues described above.
The paper must be written using correct English grammar.
There should be no spelling mistakes.
Note that I have a list of writing
suggestions available to help you avoid common mistakes.
Please take a look at these as you prepare your paper.
Last modified:
Wed Sep 8 10:59:04 CDT 2004
by
bart