CS 736 Assignment #1 (Fall 1998)
Measuring Kernel Performance using Extended lmbench
Step 1:
Copy the benchmark from /p/course/cs736-cao/public/lmbench-2alpha11.tgz;
Unpack it and install it;
Read the benchmark source and documents;
Step 2:
Build the benchmark and run it on at least two platforms. By two platforms
I mean:
- either two different operating systems
- or the same operating system running on two different processor architectures
- or the same operating system, the same processor architecture, but the processors have different clock speeds (for example, a slower Pentium and a faster
Pentium);
Report the results of all the lmbench runs;
Step 3:
Measure the resolution of the ``gettimeofday'' system call on the two
platforms. You can measure the resolution in a variety of ways.
For example, one approach is to run a simple loop, measure the duration
of the loop using two gettimeofday, then reduce the loop count until the
two calls return the same value, and then increase the loop count a bit to
see the minimum difference between two gettimeofday results. Another way is
to use the CPU's clock counter, if you know how to access it on the machines.
Step 4:
On one platform, write your own routines to measure the following:
- Latency of an integer multiply operation, a floating point multiply
operation, and a floating point divide operation.
(For those of you who have taken 752: keep the test simple and don't worry
too much about pipeline or other hardware details.)
- Measure system call latency using five different system calls other
than the one that is used in the lmbench suite. If the results differ
dramatically among the system calls, explain why.
- Measure context switch overhead using semaphores, instead of pipes as used in lmbench;
Compare the context switch overhead measured by semaphores and that measured
by pipes. Which one is more accurate?
- Measure the latency of signal (exception) handling on UNIX platforms.
You can use exceptions such as divide by zero or memory-access fault.
- Measure the latency of page faults for zero-filled pages. Zero-filled
refers to a type of page fault processing: the page has been allocated to
the process, but this is the first time the process ever accesses the page,
and the operating system simply allocate a free physical page frame, and
zero-out the page frame.
One way to measure the zero-filled page fault processing latency is the
following: ``malloc'' a large region of memory, then touch one byte out of
each page in the region, and measure how long the whole operation takes.
- Measure the latency of the ``select'' system call and the latency of
the ``poll'' system
call varying the number of socket descriptors. You should first establish
a set of connections with other machines, then ``select'' or ``poll'' among
them.
Measure the latencies varying the number of socket descriptors from 16, 64,
256 to 1024.
- Log onto two machines, and measure the latency of establishing a TCP/IP
connection between the two machines, and the latency of sending and receiving
256-byte messages on an established connection;
- Measure the latency of file read/write using different approaches:
- UNIX read/write system call;
- stream io library (i.e. stdio);
- file mmap;
Vary the size of each read/write from 8 bytes, 64 bytes, 512 bytes, ..., up to 16K bytes.
Step 5:
Repeat the micro-benchmark measurements described in Figure 8 (small-file
performance) in section 5.1 of ``The Design and Implementation of a
Log-Structured File System'' by Rosenblum and Ousterhout.
Note: you should perform the experiments on the local disk of the machine
(/var/tmp/ would be a good place). Don't perform the experiments under your
home directory, because that is mounted over AFS servers. Also don't perform
the experiments under /tmp.
Bonus points:
Measurement any other aspect of the system that interests you.
Also, there is a recent paper describing some improvements on lmbench and its
interaction with different machine architectures. You might find it
interesting, especially if you have a lot of background in computer
architecture.
Here is the URL: http://www.eecs.harvard.edu/~vino/perf/hbench.
In measuring the latencies, run the tests at least three times, and report the
maximum, average, and minimum latency;
What to submit:
Description of what you measured and summary of the results;
The measurement routines, with Makefiles showing how to build and run them;
Actual output of the three runs of the measurement routines; the output
must be in human-readable format;
Last modified: Mon Sep 14 11:55:07 CDT 1998 by cao.