|
UNIVERSITY OF WISCONSIN-MADISON
Computer Sciences Department |
|
CS 736
Fall 2017
|
|
Barton Miller |
Paper Assignment 1: Basic Benchmarking
(Assigned: Monday, September 11)
(Due: Monday, October 2, in class)
Description
The goal of this assignment is to get experience doing benchmarking
by measuring the performance of various operating system and
thread operations.
You will design various experiments, build simple tools, and carry out
a methodical experiment, summarize the results, and draw conclusions.
Be careful!
Benchmarking is a subtle and tricky business; things that look simple on
first glance will often turn out to be quite intricate.
Implementation Language
You will use GNU C (gcc) as your implementation language.
C provides simple machine language code, so will likely produce the most accurate
results.
If you used a language such as Java, Python or Perl, you could introduce interpreter
overhead that would distort the results.
Posix Threads (pthreads)
One of the most widely used thread packages is pthread, available on almost
every operating system.
It provides thread creation, control, destruction, and synchronization
operations. If you have never used pthreads, or need a review, check out
this tutorial:
The Measurements
You will use the Linux systems provided in our department.
If you would like to try this on your Mac, please come talk with me.
-
Clock precision:
The operating system and hardware provide various ways to measure time
(both elapsed and process time).
Identify two ways of measuring elapsed time and determine the resolution
(precision) of the clock.
One way to do this is to read the clock value at the start and end of
a simple loop.
Start with a single loop iteration, then
increase the iteration count of the loop until the difference between
the before and after samples is greater than zero.
Try to get the smallest non-zero positive difference.
If a single iteration of a loop takes too much time, try
putting simple statements between the two timer calls.
Repeat this test for each of the two way that you measure time.
Use the more precise way in the rest of your experiments.
-
Basic function call time:
One of the most basic low-level operations is the function call.
Whether you call them procedures, functions, or methods, they help us
organize our code into logical units that can be reused and shared.
However, there are times where the cost of calling a function can
dominate the running time of a program.
You goal is to measure the basic cost of a function call and return.
Modern compilers try to optimize this behavior in several ways, including
inlining the function code, eliminating or minimizing the stack frame set-up
and tear down, and eliminating the placing of parameters onto the stack.
When you measure the behavior of your simple function, you will need to
understand which of these behaviors are occurring.
Make sure to run compile your code with optimization levels -O0 through -O3.
-
Trivial thread and kernel call:
Choose a simple kernel call such getpid, and a simple thread call
such as pthread_self, to measure and compare the elapsed time
to perform the calls.
Choose one or two other thread and kernel calls that you suspect perform
trivial operations, and
measure the time to perform these.
Be careful
that some versions of the simple kernel calls may cache their
result in a variable on the first call, making subsequent calls artificially
fast.
You will probably need to look at the assembly code for the system
call function (using gdb or some other debugging tool to make sure
that you know what you are measuring.
-
Thread and Process creation time:
Measure the time to create and destroy a thread.
Do the same for a process.
For the threads, you will do a pthread_create followed by
a pthread_join.
The thread that you create will do nothing substantial except
pthread_exit.
Repeat this sequence a sufficient number of times.
For the processes, you will do a fork followed by
a wait (or maybe waitpid).
The process that you create will do nothing substantial except
exit.
Repeat this sequence a sufficient number of times.
-
Context switch overhead:
Measure and compare the time to context switch between two threads and between two
processes.
You should use two techniques for measuring this time:
-
For processes, use signals:
Process A will send a signal to Process B.
Upon receiving the signal, Process B will send a signal back to process A.
Repeat this operation a sufficient number of times.
-
For processes, use a pipe:
Process A will send a 1-word message to Process B.
Upon receiving the message, Process B will send a 1-word message
back to process A.
Repeat this operation a sufficient number of times.
-
For threads, use a mutex lock:
Thread A will unlock mutex 1, then block on mutex 2; thread B will
block on mutex 1, then unlock mutex 2.
Repeat this operation a sufficient number of times.
Keep your code simple.
If you use extra mutex locks, you are likely to distort your results and get
invalid results.
-
Page fault time:
This study will not use the thread library.
Measure the latency of page faults for zero-filled pages.
Zero-filled refers to a type of page fault processing: the page has
been allocated to the process, but this is the first time the process
accessed the page
The operating system allocate a free page frame, and fills it with zero values.
One way to measure the zero-filled page fault processing latency is the following:
use
malloc a large region of memory,
then touch one word out of each page in the region,
and measure how long the whole operation takes.
Make sure that your region is large enough to produce measurable
results, and also that your memory region fits in RAM.
Timing Issues
To get good timing results, you will need to repeat an operation (such as
the context switch) many times and divide the total time by the number of
repetitions.
How many times will you need to loop?
You can address this issue statistically or experimentally.
For an experimental approach, you can repeat the measurement several times for
a varying number of iterations and see when the results stabilize.
The Experimental Method
Computer Scientists are notably sloppy experimentalists.
While we do a lot of experimental work, we typically do not follow good
experimental practice.
The experimental method is a well-established regimen, used in all areas of
science.
The use of the experimental method keeps us honest and gives form to the
work that we do.
The basic parts of an experiment are:
- Identify your variables:
Variables are things that you can observe and quantify.
You need to identify which variables might be related and whether a variable
is a cause (i.e., the block size of a read operation) or the effect (e.g.,
the time to complete the read).
Even though this sounds obvious, you should consciously identify the variables
in each experiment that you perform.
- Hypothesis:
The hypothesis is a guess (we hope, an educated guess) about the outcome of
the experiment.
The hypothesis needs to be worded in a way that can be tested in an experiment,
so it should be stated in terms of the experimental variables.
- Experimental apparatus:
You need to obtain the necessary equipment for your experiment.
In this case, it will be the needed computer and software.
- Performance of experiment and record the results:
This part is the one that we typically think of as the real work.
Note that several important steps come before it.
- Summarize the results:
Summarization means putting the data in a form that you can understand.
You might put the data in tables, graphs, or use statistical techniques
to understand the raw data.
If you are using averages, make sure to read
Jim Smith's paper in the October
1988 issue of CACM (There are many types of means, and you need to use
the right on.)
- Draw conclusions:
Note that performing the experiment and summarizing the results are
separate steps and both come before you draw conclusions.
To present honest and understandable results, we must present the basic
data first (so that the reader can draw their own conclusions) before we
insert our bias.
The experimental method has more subtleties than this (such as trying
to account for experimenter and subject biases), but the above description
is sufficient for basic computer measurement experiments.
Constraints
First, make sure to read my
short document on writing style.
Pay special attention to the discussion of labeling for tables and graphs.
The paper should be
at most 6 pages
(all inclusive), 10 point font,
18 point spacing,
single-sided and 1 inch margins.
The paper must contain the following parts:
- Title:
-
The title should be descriptive and fit in one line across the page.
Interesting titles are acceptable, but avoid overly cute ones.
- Abstract:
-
This is the paper in brief;
it is not
a description of what is in the paper.
It should state the basic ideas, techniques, results, and conclusions of the paper.
The abstract is not the introduction, but a summary of everything.
It is an advertisement that will draw the reader to your paper, without
being misleading.
It should be complete enough to understand what will be covered in the paper.
Avoid phrases such as "The paper describes...."
This is a technical paper and not a mystery novel; do not be afraid of giving
away the ending.
- Body:
-
This is the main part of the paper.
It should include an introduction that prepares the reader for the
remainder of the paper.
Assume that the reader is knowledgeable about operating systems.
The introduction should motivate the rest of the discussion and outline
the approach.
The main part of the paper should be split into reasonable sections that
follow the basics of the experimental method.
This is a discussion of what the reader should have learned from the
paper.
You can repeat things stated earlier in the paper, but only to the extent
that they contribute to the final discussion.
- References:
-
You must cite each paper or source that you have referenced.
This section appears at the end of the paper.
If you use prose, any prose or any code, in your project that you did not
write yourself, no matter how brief, you must cite that use.
If you do not use a citation in these cases, then this is plagiarism,
which is a serious instance of academic misconduct and will have disastrous consequences for you.
- Figures:
-
A paper without figures, graphs, or diagrams is boring.
This paper will certainly need several performance tables and graphs.
Your paper must have figures.
Do not redescribe the assignment; address the issues described above.
The paper must be written using correct English grammar.
There should be no spelling mistakes.
Last modified:
Mon Sep 11 11:02:39 CDT 2017
by
bart