UNIVERSITY OF WISCONSIN-MADISON
Computer Sciences Department
CS 736
Fall 2021
Barton Miller

Benchmarking Interprocess Communications

(Assigned: Friday, September 10)
(Due: Monday, September 27, start of class)

Description

The goal of this assignment is to get some experience doing benchmarking by measuring the performance of various operating system and interprocess communication mechanisms. You will design various experiments, build simple tools, and carry out a methodical experiment, summarize the results, and draw conclusions.

Be careful! Benchmarking is a subtle and tricky business; things that look simple on first glance will often turn out to be quite intricate.

The Communication Mechanisms

  1. Socketpair: When sockets were added to BSD UNIX back in the 1970's, this was an addition to UNIX to replace pipes. socketpair system call is executed by a process to create both ends of a bi-directional communication channel. This channel is a stream of bytes that insures ordering and correct delivery. You will use the AF_UNIX domain. The socketpair, when combined with a fork operation, allows two processes to pass messages. The socketpair is equivalent to two pipes, though uses fewer sockets.

  2. Internet (INET) Stream Sockets: The stream socket, based on TCP/IP, is the backbone of the Internet. These sockets can be used for remote (inter-host) and local communication, providing much the same abstraction as a pipe: a reliable, ordered byte stream. Almost every operating system supports communication over these sockets.

  3. Internet (INET) Datagram Sockets: The datagram socket is based on the UDP protocol. As with the stream socket, it can be used for remote (inter-host) and local communication. It provides a message abstraction instead of a stream. In addition, it does not provide reliability or message ordering guarantees. However it does provide checksums so that if a message is delivered, it's contents are intact. As a simpler protocol, should have less overhead.

    NB: Datagram sockets can (and will) experience packet loss, so your experimental set-up must be designed to tolerate such loss. The tricky part is to figure out how to handle this so that you get meaningful results.

The Measurements

You will perform the measurements on one of the Linux systems available in one of the Computer Sciences Department's instructional labs. On this platform, you will measure the follow features:
  1. Clock precision: The operating system and hardware provide various ways to measure time. Identify two ways of measuring elapsed time and determine the resolution (precision) of the clock.

    One way to do this is to read the clock value at the start and end of a simple loop. Start with a single loop iteration, then increase the iteration count of the loop until the difference between the before and after samples is greater than zero. Try to get the smallest non-zero positive difference. If a single iteration of a loop takes too much time, try putting simple statements between the two timer calls.

    Repeat this test for each of the two way that you measure time. Use the more precise way in the rest of your experiments.

  2. Trivial kernel call: Choose a simple kernel call such getpid to measure and compare the elapsed time to perform the calls. Choose one or two other kernel calls that you suspect perform trivial operations, and measure the time to perform these.

    NB: Some Linux distributions have had their simple kernel calls modified to run faster just to look good on benchmarks, so it is important to measure more than one simple call.

  3. Inter-Process Communication Time: For each of the communication mechanisms listed above, you will measure the following characteristics:

    1. Message latency: Latency is the time for some activity to complete, from beginning to end. For message passing, it is the time from the start of a send to the completion of a receive. Since the clocks on two different hosts may not be sufficiently aligned, the easiest way to measure message latency is to measure the time it takes to complete a round-trip communication (and divide by two).

      NB: Beware of nagling on the Internet stream experiments. This mechanism can cause unexpected delays. You can disable nagling with the TCP_NODELAY socket option.

      Measure latency for a variety of message sizes: 4, 16, 64, 256, 1K, 4K, 16K, 64K, 256K, and 512K bytes.

      Serious NB: Watch out for message size limits on UDP. How will you handle experiments where you send a large enough UDB packet?

      NB 2: Transmission times can be affecting in TCP by the MTU (maximum transmission unit) size (typically 1500 bytes) and read and write buffer sizes (typically 128KB. If you have root access to a Linux system, you can experiment with increasing these sizes. Of course, you will not be allowed to do that on the instructional Linux systems.

    2. Throughput: Throughput is the amount data that is sent per unit time. In this case, a round trip measure is not necessary; you can sent a return message when the entire transfer amount has been sent. Send a large enough total quantity of data such that the single "ack" response contributes a small amount of time compared to the whole transfer.

      Measure throughput for a variety of message sizes, the same as 3a above (and same warning).

The Experimental Method

Computer Scientists are notably sloppy experimentalists. While we do a lot of experimental work, we typically do not follow good experimental practice. The experimental method is a well-established regimen, used in all areas of science. The use of the experimental method keeps us honest and gives form to the work that we do.

The basic parts of an experiment are:

The experimental method has more subtleties than this (such as trying to account for experimenter and subject biases), but the above description is sufficient for basic computer measurement experiments.

Implementation Language

We want to use a language that generates the minimum overhead, so we will avoid such interpreted languages as Python, Perl, or Ruby. The easy choices are to use C or C++. For the more adventurous, you can try Rust, a modern systems programming language.

Learning about Sockets

If you need help with using the various socket calls, here are some resources suggested by my members of my research group:

Constraints

The paper should be at most 6 pages (all inclusive), 10 point font, 18 point spacing, single-sided, one column, and 1 inch margins. If you do not understand any of these constraints, make sure to come talk with me.

The paper must contain the following parts:

Title:
The title should be descriptive and fit in one line across the page. Interesting titles are acceptable, but avoid overly cute ones.
Abstract:
This is the paper in brief; it is not a description of what is in the paper. It should state the basic ideas, techniques, results, and conclusions of the paper. The abstract is not the introduction, but a summary of everything. It is an advertisement that will draw the reader to your paper, without being misleading. It should be complete enough to understand what will be covered in the paper. Avoid phrases such as "The paper describes...." This is a technical paper and not a mystery novel; do not be afraid of giving away the ending.
Body:
This is the main part of the paper. It should include an introduction that prepares the reader for the remainder of the paper. Assume that the reader is knowledgeable about operating systems. The introduction should motivate the rest of the discussion and outline the approach. The main part of the paper should be split into reasonable sections that follow the basics of the experimental method. This is a discussion of what the reader should have learned from the paper. You can repeat things stated earlier in the paper, but only to the extent that they contribute to the final discussion.
References:
You must cite each paper that you have referenced. This section appears at the end of the paper.
Figures:
A paper without figures, graphs, or diagrams is boring. This paper will certainly need several performance tables and graphs. Your paper must have figures.

Do not re-describe the assignment; address the issues described above. The paper must be written using correct English grammar. There should be no spelling mistakes.

Note that your paper will be evaluated on both the technical content and the presentation (as would any paper submitted to a journal or conference).

Note that I have a list of writing suggestions available to help you avoid common mistakes. It is essential that you take a look at these suggestions as you prepare your paper.

Teams

In the real world, both in industry and research, teams are critical to doing effective work. In this class, for this assignment and the upcoming research project, you can work in teams of two. While you are not required to do so, you will be much more effective working in a team.

Note that you still need to understand the details of what your teammate has done.

Original Work

You are expected to do your own work and not copy from other people. However, you may use outside resources to understand how the various system calls on Linux work.

Whenever you use any prose or code from an outside source, you might cite that source. Not citing a source is considered plagiarism, which is a serious professional breach and is considered academic misconduct.

If you have any questions about whether outsides materials are OK to use or questions about how cite your usage of outside materials, please talk with Bart.

Valid HTML 4.01 Transitional


Last modified: Wed 08 Sep 2021 02:00:43 AM CDT by bart