CS 736 Assignment #1 (Fall 1998)

Measuring Kernel Performance using Extended lmbench

Step 1:
Copy the benchmark from /p/course/cs736-cao/public/lmbench-2alpha11.tgz;
Unpack it and install it;

Read the benchmark source and documents;

Step 2: Build the benchmark and run it on at least two platforms. By two platforms I mean:

either two different operating systems
or the same operating system running on two different processor architectures
or the same operating system, the same processor architecture, but the processors have different clock speeds (for example, a slower Pentium and a faster Pentium);

Report the results of all the lmbench runs;

Step 3:
Measure the resolution of the ``gettimeofday'' system call on the two platforms. You can measure the resolution in a variety of ways. For example, one approach is to run a simple loop, measure the duration of the loop using two gettimeofday, then reduce the loop count until the two calls return the same value, and then increase the loop count a bit to see the minimum difference between two gettimeofday results. Another way is to use the CPU's clock counter, if you know how to access it on the machines.

Step 4:
On one platform, write your own routines to measure the following:

Latency of an integer multiply operation, a floating point multiply operation, and a floating point divide operation.
(For those of you who have taken 752: keep the test simple and don't worry too much about pipeline or other hardware details.)
Measure system call latency using five different system calls other than the one that is used in the lmbench suite. If the results differ dramatically among the system calls, explain why.
Measure context switch overhead using semaphores, instead of pipes as used in lmbench;
Compare the context switch overhead measured by semaphores and that measured by pipes. Which one is more accurate?
Measure the latency of signal (exception) handling on UNIX platforms. You can use exceptions such as divide by zero or memory-access fault.
Measure the latency of page faults for zero-filled pages. Zero-filled refers to a type of page fault processing: the page has been allocated to the process, but this is the first time the process ever accesses the page, and the operating system simply allocate a free physical page frame, and zero-out the page frame.
One way to measure the zero-filled page fault processing latency is the following: ``malloc'' a large region of memory, then touch one byte out of each page in the region, and measure how long the whole operation takes.

Measure the latency of the ``select'' system call and the latency of the ``poll'' system call varying the number of socket descriptors. You should first establish a set of connections with other machines, then ``select'' or ``poll'' among them.
Measure the latencies varying the number of socket descriptors from 16, 64, 256 to 1024.
Log onto two machines, and measure the latency of establishing a TCP/IP connection between the two machines, and the latency of sending and receiving 256-byte messages on an established connection;
Measure the latency of file read/write using different approaches:
- UNIX read/write system call;
- stream io library (i.e. stdio);
- file mmap;
Vary the size of each read/write from 8 bytes, 64 bytes, 512 bytes, ..., up to 16K bytes.

Step 5:
Repeat the micro-benchmark measurements described in Figure 8 (small-file performance) in section 5.1 of ``The Design and Implementation of a Log-Structured File System'' by Rosenblum and Ousterhout.

Note: you should perform the experiments on the local disk of the machine (/var/tmp/ would be a good place). Don't perform the experiments under your home directory, because that is mounted over AFS servers. Also don't perform the experiments under /tmp.

Bonus points:
Measurement any other aspect of the system that interests you.

Also, there is a recent paper describing some improvements on lmbench and its interaction with different machine architectures. You might find it interesting, especially if you have a lot of background in computer architecture. Here is the URL: http://www.eecs.harvard.edu/~vino/perf/hbench.

In measuring the latencies, run the tests at least three times, and report the maximum, average, and minimum latency;

What to submit:
Description of what you measured and summary of the results;
The measurement routines, with Makefiles showing how to build and run them;
Actual output of the three runs of the measurement routines; the output must be in human-readable format;

Last modified: Mon Sep 14 11:55:07 CDT 1998 by cao.