Overview
In this warm-up project, you will create a series of user-level workloads that
exercise the underlying operating system in various ways: some of the workloads
should perform very well, while other (apparently similar) workloads
should perform very poorly. You will construct these workloads by
changing how one specific process interacts with the virtual memory
system and with the file system.
Objectives
The main objectives of this mini project are for you to:
- understand the performance impact that the OS, in particular the
virtual memory and file systems, can have on
seemingly similar workloads (this is the main technical objective)
- to refresh your basic UNIX-based systems programming skills
- gain more experience performing careful performance measurements
and construct hypotheses (or simple models) for understanding basic
performance
- work through a project that is slightly open-ended
and not completely specified; learn to make reasonable assumptions
and describe your choices
- be able to communicate your results in a written report and
visually with appropriate graphs
Motivation
Being able to understand why a workload performs the way it does on a
given system is a very valuable skill. For example, if you are an
application developer, then understanding how the OS performs can help
you to avoid some designs that will not perform well. Alternatively,
if you are an OS developer, then this understanding can help you to
quickly identify performance problems for a given workload and
optimize key components.
For you to be able to predict workload performance, you must have a
basic model and understanding of the system. For example, consider a
basic memory reference within an application. If you have a very
trivial model for the access time of a memory reference, then you
might assume that it will take exactly as long as a DRAM reference
(e.g., 50 ns). However, since this is a CS course, you already know
that this is a naive model and that other steps might occur when
memory is referenced. If this were a Computer Architecture course,
you might focus on the fact that this access could hit in some level
of cache. Given that this is an Operating System course, you know
that even more steps may occur when memory is referenced, greatly
impacting the time for that memory reference.
To learn more about the impact of the OS on workloads, your task is to
construct a total of six different workloads: 3 that exercise memory
and 3 that exercise the file system. The 3 workloads should have the
following characteristics.
- You should first design a "good"
workload that will obtain the best possible performance from
the memory or file system.
- Next, you should design a "bad"
workload that obtains dramatically worse performance due to
interactions with the memory or file system. You want to
maximize the ratio between the good performance and the bad
performance, while finding an "interesting" bad workload.
- Finally, you should
design an "ugly" workload that obtains performance somewhere in
between the good and bad workloads, and should exericse the system
(either memory or the file system) in some "interesting" way that is
directly related to the OS (and not the hardware architecture).
Details
This project is intentionally somewhat open-ended so that you have a
bit of freedom to think and to develop something interesting. However, here are some questions and answers
to guide you further.
What do we mean by a workload that accesses memory? You should
construct a workload that simply references a set of (largely) unique memory
locations in some pattern. The references can be either reads or
writes. For example, a valid workload could read
sequentially increasing bytes of memory inside a tight loop with
little other computation. You may not repeatedly access the same
memory location over and over again (but, if you have some repetition
over a larger interval, that could be fine); remember, that your goal
is to stress the features of the OS more than the architecture.
What do we mean by a workload that accesses the file system?
You should construct a workload that reads or writes to already opened
local files. Your workload can consist of one or more files.
Make sure that all of the files are allocated on a local disk (i.e.,
not a file in a distributed file system such as AFS where your CS
account resides). Your workload can also contain a few other
operations (e.g., lseek() and fsync() could be interesting). In your
measurements, do not include the time that it takes to open the file
and obtain a file descriptor. Those reads or writes can be performed
in any pattern you choose, but again, should not be repeatedly to the
same file locations.
What different OS paths can you use to obtain different performance
for the good, the bad, and the ugly workloads? This is the
million dollar question for this assignment. You can start by
thinking about different architectural characteristics that impact
performance (e.g., whether memory requests hit in cache and whether
disk requests incur seek or rotational costs), but this isn't all
that you should focus on. For your ugly workload especially, you
might want to think about requests that trigger allocations, or that
compete against other requests either inside or outside the primary
thread/process. Feel free to show off!
What should you measure in your workloads? The goal is to show
that the three workloads all obtain strikingly different performance
from one another. To do this, you should measure and report both the
throughput of your workload (e.g., in terms of operations/sec or
bytes/sec or some suitable metric) and the average access time of
requests in your workload. In addition to these base metrics, you
should also report the bad:good ratio and the ugly:good ratio for each
system. The workload throughput and average access time should be
computed over many, many requests such that you are not measuring any
start-up costs; you should be measuring steady-state performance. You
can begin by using a basic timer like gettimeofday. If you are
measuring the entire time taken by your workload (instead of the sum
of the time taken by individual memory or file accesses), then be
careful that your workload is not performing other significant
operations (whether computation or sleeping).
What can you use to graph the measurements you collect?The more
you can visualize about the data you collect, the better this work
will be. It can be tricky to figure out what data actually shows
something interesting to your audience. Once you have some data in
mind, you can use something like gnuplot,
zplot, or ploticus
to create beautiful graphs.
What must you explain for each workload? You should give some
intuition for why you chose a particular workload and what you
expected to see. What path through the OS and architecture did you
expect this workload to take? After you have your measurments, you
should carefully describe any conclusions that you can draw. How
can you infer how the OS or hardware is being used and be sure of
your answer?
Do you need any additional reading for some ideas on what to
measure? For example, consider the following papers that
looked at fingerprinting, or micro-benchmarking, various
components in the memory and file system hierarchies.
- Measuring Cache and TLB Performance and Their Effect on Benchmark
Runtimes, by R.H. Saavedra and A.J. Smith, in IEEE Transactions on Computers, pages 1223--1235, October 1995.
-
Exploiting Gray-Box Knowledge of Buffer-Cache Management
Nathan C. Burnett, John Bent, Andrea C. Arpaci-Dusseau, Remzi
H. Arpaci-Dusseau
The 2002 USENIX Annual Technical Conference, Monterey, CA, USA, June
2002
-
Microbenchmark-based extraction of local and global disk
characteristics by N. Talagala, R. Arpaci-Dusseau, and D. Patterson
University of California, Berkeley 1999 CSD-99-1063
Grading
This project is worth a maximum of 100 points.
It is very straight-forward to complete the basics of this assignment
and to not do anything very interesting. For
following the bare specification given above, you can obtain a maximum of
60 base points. If you would like additional points (up to 100 points
total), you can do any of the
following:
- 5 points/workload (max 20 points): Control the precise behavior of your
"bad" or "ugly" workload by varying some workload parameter. That is, you
should vary a workload characteristic such that your "good" workload
becomes more and more like your "bad" (or "ugly" workload). You
should present a graph showing this parameter being changed along the
x-axis and the resulting performance along the y-axis. Explain why
the performance has the general shape that it does (e.g., is it
linear?) If there is a performance cliff at some point, explain why
this occurs at the point that it does.
- 5 points/workload (max 20 points): Demonstrate that you understand why
your workload is obtaining the performance that it is.
You may need to perform additional measurements to obtain the
performance cost of taking different paths in the OS and then show
how those costs combine to match the results of your workloads.
- 5 points/workload (max 20 points): Create an interesting graph for a
workload. A simple bar graph reporting the
performance of the 3 workloads isn't interesting and it
doesn't explain anything about the workloads. If you plot
the access time of each request over time within your workload, do
you see anything interesting? Do you see any patterns? Can you
explain why those patterns occur?
- 5 points/workload (max 10 points):Find an interesting or novel "bad"
or "ugly" workload that exercises some component of the OS in a
way that few other students in the class also choose.
- 5 points/system (max 10 points): Obtain a dramatic ratio for your
bad:good workloads compared to other students in the class.
Your additional bonus points will be added to your base points for a
final grade, which will be capped at 100 points. For example, if you
obtain 52 base points, parameterize your ugly memory workload (5
points), explain in detail your bad and ugly file system workloads
(10 points), and find an extremely interesting ugly file system workload (5
points), you will obtain a total of 52+20 = 72 points.
Rules
- You must work on this project alone. You must write all of the
code for creating workloads yourself. You must run and measure the
workloads on your own.
- You must create the workloads in C and run on a UNIX-based
system (e.g., Linux).
- Be sure to run your experiments on an otherwise quiet system;
there shouldn't be anything running on your machine that you don't control.
- Be sure to report all relevant details about the machine you are
using (e.g., definitely the OS version, the amount of physical
memory, the disk model, the local file system you are using).
- Your experimental results must be repeatable. You should control
the precise state of the system when the experiment begins such that
experiment will perform similarly each time.
Paper Write-Up
The paper should be at most 6 pages (all inclusive), 10 point font,
single-sided and 1-inch margins; you can choose single or double
column In your write-up, you should
not re-describe the assignment. Your paper must be written
using correct English grammar and full sentences. You should have no
spelling mistakes! The paper must contain the following parts:
- Title:
The title should be descriptive and fit in one line across the page.
- Abstract: This is the paper in brief and should state the
basic contents and conclusions of the paper. The abstract is
not the introduction to the paper, but is a summary of
everything. It is an advertisement that will draw the reader to your
paper, without being misleading. It should be complete enough to
understand what will be covered in the paper. This is a technical
paper and not a mystery novel -- don't be afraid of giving away the
ending!
- Introduction: The introduction is a section of the main
body of the paper. It should prepare the reader for the
remainder of the paper, motivating the problem, and outlining the
approach.
- Experiments: The rest of the paper should be split into
reasonable sections. You should begin by describing your experimental
platform (e.g., the hardware, operating system, and compiler with
versions, options, and flags as necessary).
For each of the 6 experiments, you should briefly describe your
workloads (such that it could be replicated by someone with a
reasonable OS background). To be adequately precise, you may want to
include code snippets or pseudo-code; make sure to report the number
of memory or file accesses performed in each.
For each experiment, you should give some intuition for why you chose a particular workload and
what you expected to see. What path through the OS do you expect
this workload to take? How is the workload expected to exercise the
underlying hardware? For each experiment, you should carefully describe any conclusions that you can
draw. From your experiments, how can you infer how the OS or hardware
is being used?
You are also welcome to briefly describe any negative results (i.e.,
experiments that didn't end up exercising the OS in the way you
expected or that didn't make the point that you hoped they would
make). For negative results you do
not need to give as many details or show figures.
- Figures: A paper without figures, graphs, or diagrams is
boring. Your paper must
have figures. What you choose to graph is up to you, just be sure to
graph something illuminating and that helps explain something to the
reader. For each experiment, you should present your results
concisely and in table or graph form when appropriate; you should
identify those variables which you control (e.g., by placing them
along the x-axis), and identify your performance metric especially
their units. Explain you graphs in the text.
- Conclusions:
This is a discussion of what the reader should have learned from the
paper. You can repeat things stated earlier in the paper, but only to
the extent that they contribute to the final discussion.