Homework 3 // Due at Lecture Monday, October 12 2010
Perform this assignment on the x86-64 Nehalem-based
systems ale-01.cs.wisc.edu and ale-02.cs.wisc.edu.
You should do this assignment alone. No late assignments.
Purpose
The purpose of this assignment is to explore the features of Intel's
(R) Thread Building Blocks multithreading package, including task
management, synchronization, and concurrent data structures.
Programming Environment: OpenMP & TBB
OpenMP is a shared-memory programming model that attempts to
automatically parallelize code that was written in a (mostly) serial
fashion. OpenMP makes extensive use of compiler directives and
optimizations, in addition to its own runtime library.
If you have not already done so, it is suggested that you review
the OpenMP references provided in the Reading List.
OpenMP uses a Fork/Join model similar to that of P-Threads, but
Fork/Join events are more frequent in OpenMP than in most P-Thread
based programs. Most OpenMP programs consist of
interleaved parallel and sequential sections, with "Fork" events
occurring at the start of each parallel section, and "Join" events
at the end of each parallel section. In non-parallel sections,
only the "master thread" executes.
In order to use the OpenMP environment on ale, students should
use the icc complier. Any source files that employ OpenMP
directives or library calls must include the omp.h header file.
Additionally, the flag -openmp must be passed to icc for both
compilation and linking.
A set of OpenMP example programs are available for review here.
Intel's Thread Building
Blocks (TBB) package provides a host of useful paralllel
programmer services, including some of the same loop parallelization
options provided by OpenMP and task-parallel tools like Cilk. Intel
provides a handy Getting Started Guide that is available at
the link above under the Documentation tab, which will show
you everything you need to know about TBB for the purposes of this
assignment. You will find the Tutorial document very useful
as well.
Programming Task: N-Body Simulation
An n-body simulation calculates the gravitational effects of the
masses of n bodies on each others' positions and velocities. The
final values are generated by incrementally updating the bodies over
many small time-steps. We will look at two approaches to this
problem. First, we will calculate the pairwise force exerted on each
particle by all other particles, an O(n2) operation.
Second, we will use an quadtree data structure to implement an 0(n log
n) approximation algorithm. A great overview of the O (n log n)
algorithm can be
found here. For
simplicity, we will model the bodies in a two-dimensional space.
The physics.
We review the equations governing the motion of the particles according to
Newton's laws of motion and gravitation. Don't worry if your physics is a bit
rusty; all of the necessary formulas are included below. We already know each
particle's position (rx, ry) and velocity
(vx, vy). To model the dynamics of the system,
we must determine the net force exerted on each particle.
-
Pairwise force.
Newton's law of universal gravitation asserts that
the strength of the gravitational force between two particles is given by
the product of their masses divided by the square of the distance
between them, scaled by the gravitational constant G, which is 6.67 × 10-11
N m2 / kg2.
The pull of one particle towards another acts on the line between them.
Since we will be using Cartesian coordinates to represent the position of
a particle, it
is convenient to break up the force into its x and y components
(Fx, Fy) as illustrated below.
-
Net force.
The principle of superposition says that
the net force acting on a particle in the x or y direction is the sum
of the pairwise forces acting on the particle in that direction.
-
Acceleration.
Newton's second law of motion postulates that
the accelerations in the x and y directions are given by:
ax = Fx / m, ay = Fy / m.
The numerics.
We use the leapfrog finite difference approximation scheme
to numerically integrate the above equations: this is the
basis for most astrophysical simulations of gravitational systems.
In the leapfrog scheme, we discretize time, and update the time
variable t in increments of the time quantum Δt.
We maintain the position and velocity of each particle, but they are half a
time step out of phase (which explains the name leapfrog). The steps below illustrate
how to evolve the positions and velocities of the particles.
For each particle:
-
Calculate the net force acting on it at time t using Newton's
law of gravitation and the principle of superposition.
-
Calculate its acceleration (ax, ay) at time t
using its force at time t and Newton's second law of motion.
-
Calculate its velocity at time t + Δt / 2 by using
its acceleration at time t and its velocity
(vx, vy)
at time t - Δt / 2.
Assume that the acceleration remains constant in this interval, so that the updated
velocity is:
vx = vx + Δt ax,
vy = vy + Δt ay.
-
Calculate its position at time t + Δt by using
its velocity at time t + Δt / 2 and its position
at time t.
Assume that the velocity remains constant in the interval from
t to t + Δt, so that
the resulting position is given by
rx = rx + Δt vx,
ry = ry + Δt vy.
Note that because of the leapfrog scheme, the constant velocity we are using
is the one estimated at the middle of the interval rather than either of the
endpoints.
As you would expect, the simulation is more accurate when Δt is
very small, but this comes at the price of more computation.
Problem 1: Parallelize O(n2) N-Body
For this problem, you are to parallelize the O(n2)
pairwise version of the N-body simulation using both OpenMP and
TBB. You may use any of the TBB mechanisms, though you may
find parallel_for most useful.
You are required to make an argument that your n-body implementations are
correct. A good way to do this is to initialize the bodies to a special-case
starting condition, and then show that after a number of time steps the state of
the grid exhibits symmetry or some other expected property. You need not prove
your implementation's correctness in the literal sense. However, please
annotate any simulation outputs clearly.
Problem 2: Parallelize O(n log n) N-Body
Everything in TBB is a task. In this problem, you are to explore
the many ways to utilize tasks by parallelizing the O(n log n) version
of the N-body simulation. You will find that this version of the
simulation heavily utilizes recursion; recursive calls often make great tasks.
In this problem you should experiment with the granularity of
tasks. Too many tasks leads to high overheads and too few tasks
parallelize poorly. You should incrementally modify your parallization
strategy until you find one that is a good balance between overhead
and parallelization that leads to good performance.
Problem 3: Analysis of N-Body Algorithms
In this section, you will analyze the performance of your three N-Body
implementations.
Part A: Plot the normalized (versus the serial n2
version) speedups of programs 1a and 1b on the same
graph for N=[1,2,4,8,16] threads and for 512 bodies
and 5000 timesteps. The value of dt is irrelevant to studying
scalability: with the number of time steps held constant, it only
affects the length of time simulated, not the duration of the
simulation itself. Thus you may choose any value you like.
Part B: Plot the normalized (versus the respective serial
version of n-body) speedups of Programs 1a, 1b, and 2 on N=[1,2,4,8,16]
threads for 512 bodies and 5000 time steps.
Part C: Plot the execution time of Programs 1a, 1b, and 2 on
N=[1,2,4,8,16] threads for 512 bodies and 5000 time
steps on the same graph.
Problem 4: Questions (Submission Credit)
- Comment on the TBB programming environment. Specifically compare and contrast it to pthreads. Which do you like better, and why?
- Did you like OpenMP or TBB better? In problem 1, which had better performance? Why do you think that is?
- Desribe your parallelization strategy for both problems 1 and 2. Which was easier? Which scaled better?
- Comment on the relative speedups between parallelization strategies. Was the speedup worth the additional effort of the more difficult program?
Source Code
We will provide you with working implementations of both n2
and n log n n-body simulations. They are stored in a mercurial
repository. You can check it out with the following command:
hg clone /p/course/cs758-david/public/repo/hw3
This assigment can be completed by implementing new subclasses of the
NbodySimulator class. See the existing code/Makefile for more
direction.
It is your responsibility to get TBB installed and working in your
working directory. All students should use TBB version 3.0, available
on the TBB website.
Tips and Tricks
- Start early.
- Make use of the demo programs provided.
- Read TBB's Tutorial
- Don't forget to add -ltbb and other useful switches in the provided Makefile
- Don't forget to source TBB's environment variables!
What to Hand In
Please turn this homework in on paper at the beginning of lecture. You
must include:
- A printout of your parallel implementation of Programs 1a and 1b. Only include relevant code.
- A printout of the parallel implementation of Program 2. Only include relevant code.
- Arguments for the correctness of Programs 1 and 2.
- The plots as described in Problem 3a and 3b, including labels describing your data.
- Answers to the questions in Problem 4.
Important: Include your name on EVERY page.
|