CS 537: Introduction to Operating Systems

Project 3: Lies and damned lies

Executive Summary

This project is designed to expose you to some of the issues involved in efficient systems programming and teach you how to measure the cost of various system calls. Not all uses of system calls are created equal, and a good systems programmer knows which sorts of operations are efficient and which are not. In particular, we will examine file I/O and interprocess communication costs. Last (but not least), you will learn about the various ways that computer systems provide to measure time, and you will also get a birds'-eye introduction to the deep, tricky, and nuanced craft of benchmarking.

Acknowledgement: This project is based on an assignment developed by Prof. Bart Miller.

Assigned: July 26

Due: August 7 at 10:00 PM (turn in by 10 PM on August 5 for 10% extra credit)

You must work in groups of two or three students.

Background

Premature optimization is the root of all evil

--Don Knuth

Throughout the term, we've examined various mechanisms for synchronization, communication, and resource management. A thread that has run through all of our discussions is optimization for the common case. That is, the most typical operations should be as fast as possible, even at an expense to less frequent operations. After completing CS 537, you have a pretty good idea about what operations are "easy" for system mechanisms to support and what operations are "hard" -- but can you apply this knowledge to your own programs? Do you recognize the characteristics of a program that is difficult for the system to execute efficiently?

If you answered "no," you're probably right. In fact, most experienced programmers likely don't know the "best" way to use the operating system interface for their preferred systems. There may not be a single right answer to performance problems, either: just as some policies are better for some workloads than others, the sorts of mechanisms that work well for one type of application may work against the goals of others.

So what do we do? Do we compile a list of "bad idioms" and then run a smart search-and-replace on our code? Maybe. We might just be wasting effort, though. Why spend time optimizing a part of the program that's rarely run? Indeed, we want to optimize for the common case, just like system designers do. So we'll need some way to find out where a performance problem is presenting, and why that code is underperforming.

The answer to "where" is almost always found with the aid of a profiler. A profiler typically interrupts your program at regular intervals, recording stack traces each time. This way, you can get a statistical approximation of the functions in which your program is spending the most time. Common sample-based profiling tools include gprof (on all Unix systems), Shark (on Macs), and OProfile (on Linux).

However, many profilers treat the kernel as a black box. If your program spends a lot of time in the kernel, a profiler may not be able to tell you what your program is doing in the kernel. (OProfile is a notable exception.) Therefore, we need some way to know whether certain kernel calls are less performant than others, and why this is the case. This assignment will focus on giving you the tools to answer the first of these two questions and encouraging you to apply your knowledge of operating system concepts to answer the second.

Obtaining timing measurements

There are two main criteria for timing measurements: precision and accuracy. These two are not the same thing! Precision refers to the number of significant digits in a time -- does a timer operate with second granularity? with millisecond granularity? with cycle granularity? The more precise a result, the more likely it is to be reproducible. Accuracy, on the other hand, refers to fidelity to some actual standard of time.

There are a variety of available timing routines. For this project, you will use two: the POSIX clock_gettime() function and a high-resolution timing library that uses the Pentium's cycle-accurate time-stamp counter. (See the man page for the former; to use it, you will have to link with the rt library. For the latter, see ~willb/public/tsc/timing.h.) Using two different timers will also allow you "sanity-check" your results.

Obviously, many unrelated factors like cache effects and interactions with other processes on the machine will affect your timings. It is also very difficult to measure short code sequences on modern superscalar processors, and the very process of collecting timer data may perturb your results! Therefore, for best results, you should measure the time it takes to repeat the operation you are interested in timing many times, and divide that time by the number of repetitions. How many is "many?" You can find out by experimenting with a variety of numbers of iterations; you'll have the right number when your results are stable.

The experimental method

Hey, maybe they don't call it computer science for nothing after all! Following the experimental method -- used by researchers in all scientific disciplines -- will ensure that your results are reproducible, intellectually honest, and (to the extent that it's possible) untainted by your own biases.

Here's what you need to do:

  1. Document your experimental apparatus -- in this case, your program and the techniques you used for measurement, the type of system you ran it on, etc. This is important so people can reproduce your results later.
  2. Perform your experiments. (See below for specifics.)
  3. Present your results visually. A picture is worth a thousand words, and a good graph may be worth even more. Tables are also useful in many contexts. You will be preparing graphs and tables.
  4. Summarize the results. You'll write a paragraph or two discussing notable details of the results, as well as any patterns you observe.
  5. Draw and present conclusions. Finally, an opportunity for your own bias! What do the data mean? Write a paragraph or two explaining why you believe the results for each experiment turned out as they did.

(You'll be assembling this all into a short report, to be turned in as your cover letter. Don't worry: there is a hard upper limit of 6 pages, and you can probably do it in 4 or 5!)

What you're going to measure

Your experiments will consist of measuring the following operations:

  1. Sequential reads of a large file, with read sizes of 1, 8, 64, 512, 1024, 4096, and 65536 bytes. You will test this with the read system call, the fread library call, and the mmap system call (you will have to explicitly copy the bytes in question in the mmap case). Be sure to consider the effects that buffering might have on performance. Based on your results, how are each of these calls implemented?
  2. Random-access reads of a large file, with read sizes of 1, 8, 64, 512, 1024, 4096, and 65536 bytes. (Hint: you'll need to see the man page for the lseek system call.) As before, you will test this with the read system call, the fread library call, and the mmap system call (you will have to explicitly copy the bytes in question in the mmap case).
  3. Context switching between two processes that are communicating via a pipe. Process A should send a 4-byte message to Process B. When Process B receives the message, it should send a 4-byte message back to Process A. Be sure to repeat this operation a sufficient number of times in order to get reproducible, precise results.
  4. Context switching between two threads that are communicating via POSIX semaphores.
  5. Time to acquire and release a POSIX threads mutex, vs. time to wait() and signal() on a binary semaphore.

What you're going to do next

Write your report, addressing each issue in the "experimental method" section above. Your report should include graphs and tables as appropriate, and should be written clearly and concisely. (If you need advice, feel free to ask me, or find a copy of Justin Zobel's Writing For Computer Science.)

Hand in your report, along with the .c files you used to run your experiments.

Pat yourself on the back for doing a good job in CS 537.

Tips

Make sure the file you are reading from is on a local disk, and not in RAM or over the network. (Creating a fairly large (e.g. > 200mb) file in /tmp should be sufficient)

You should describe read times in terms of bandwidth -- i.e. 4mb/sec.

Be sure to use the right kind of average; know when to use arithmetic and harmonic mean!

Stay tuned to the class list for more tips!