Homework 1 // Due at Lecture Monday Feb 7
You should do this assignment alone. No late assignments.
The purpose of the assignment is to give you experience writing simple
shared memory programs using OpenMP and MPI. This exercise is intended to provide
a gentle introduction to parallel programming and will
provide a foundation for writing and/or running much more complex programs on
various parallel programming environments.
You will do this assignment on malbec.cs.wisc.edu - a Sun Fire T2000
Server containing a 64-thread Sun UltraSparc-T2 processor. We have given you individual
accounts on this machine. Use them only for 757 homework assignments and, with instructor's permission,
for your course project. The accounts and storage will be deleted at the end of the
semester unless you obtain permission from the instructor for extending them.
Run /usr/platform/sun4v/sbin/prtdiag -v on malbec, if you wish to learn more
about the machine.
The original version of this assignment and the reference OpenMP programs were developed by Dan Gibson for CS838 offered in Fall 2005.
OpenMP
OpenMP is an API for shared-memory parallel programming for C/C++ and Fortran. It consists of
preprocessor (compiler) directives, library routines and environment variables that determine the
parallel execution of a program.
For this assignment, you will use the Sun Studio implementation of OpenMP that is already installed
on malbec. A set of sample OpenMP programs are available here.
A presentation on using OpenMP by Dan Gibson is available here.
It is strongly recommended that you download and run them to get a hands-on experience of compiling,
linking and running OpenMP programs.
Remember to
- Use Sun Studio cc compiler to compile your code.
- Include omp.h in all the source files that use OpenMP directives or library calls.
- Use the flag -xopenmp for compilation and linking of your source files.
You can also use this sample Makefile provided by Somayeh Sardashti.
MPI
MPI is a library specification for message-passing, proposed as a standard for use on parallel computers, clusters and
heterogeneous networks. The MPI standard and related information is available at
http://www.mcs.anl.gov/mpi. An online reference to MPI can be found at
http://www-unix.mcs.anl.gov/mpi/www/. An MPI example program is available for review here.
For MPI programs
- Include mpi.h in all the source files that use MPI library calls.
- Compile with the mpicc compiler located at /scratch/mpi/mpich2-1.0.8/bin/.
- Set your LD_LIBRARY_PATH to point to /scratch/mpi/mpich2-1.0.8/lib/.
- To run your MPI program, use mpiexec located at /scratch/mpi/mpich2-1.0.8/bin/.
- You may have to start mpi using mpd & located at /scratch/mpi/mpich2-1.0.8/bin/
If mpd asks you to create .mpd.conf, do so. Make sure the permissions are the way mpd is asking them to be, and
rerun mpd & after creating the file.
Programming Task: Ocean Simulation
OCEAN is a simulation of large-scale sea conditions from the SPLASH
benchmark suite. It is a scientific workload used for performance
evaluation of parallel machines. For this assignment, you will write
two scaled-down versions of the variant of the Ocean benchmark.
Ocean is briefly described in Woo et al. on the Reading List. The
scaled-down version you will implement is described below.
Our version of Ocean will simulate water temperatures using a large
grid of integer values over a fixed number of time steps.
At each time step, the value of a given grid location will be averaged with the values of its
immediate north, south, east, and west neighbors
to determine the value of that grid location in the next time step (total of
five grid locations averaged to produce the next value for a given location).
As illustrated in Figure 1, value calculations for two adjacent grid
locations are not independent. Specifically, the value of the grid
location number 6 is calculated using the values of the grid locations
6, 2, 7, 10, and 5, and the value of the grid location number 10 is calculated
using the values of the grid locations 10, 6, 11, 14, and 9.
Because 6 depends on 10 and 10 depends on 6, the resulting values for 6 and 10
will depend on the order in which the values for 6 and 10 were calculated.
|
|
Figure 1: Calculation Dependence |
Figure 2: Red and Black Independent Sets |
The grid points in the ocean grid can be separated into two independent
subsets shown in red and black in Figure 2. Instead of calculating a new
value for each grid location at every time step, the values for the grid points
in the red subset are updated on even time steps and the values for the
grid points in the black subset are updated on odd time steps.
The edges of the grid shown in green in Figure 2 do not participate in the averaging process (they
contribute a value, but their value does not change). Thus, Ocean will
converge (given sufficient runtime) to a gradient of the water temperatures
on the perimeter of the grid.
Problem 1: Write Sequential Ocean (5 points)
Write a single-threaded (sequential) version of Ocean as described
above. This version of Ocean must take three arguments: the
x-dimension of the grid, the y-dimension of the grid, and the number
of time steps. You may assume for simplicity that all grid sizes
will be powers of two plus two (i.e. (2^n)+2); therefore the area of
the grid that will be modified will be sized to powers of two
(+2 takes care of the edges that are not modified).
You are required to make an argument that your implementation of
Ocean is correct. A good way to do this is to initialize the grid
to a special-case starting condition, and then show that after a number
of time steps the state of the grid exhibits symmetry or some other
expected property. You need not prove your implementation's
correctness in the literal since. However, please annotate any
simulation outputs clearly.
Your final sequential version of Ocean should randomly initialize a
grid of the requested size, then perform simulation for the specified
number of time steps.
Problem 2: Write Parallel Ocean using OpenMP (5 points)
For this problem, you will use OpenMP directives to parallelize your
sequential program. You are required to use the
schedule(dynamic) clause on loops that you will parallelize
with OpenMP. This will cause loop iterations to be dynamically
allocated to threads. Please be sure to explicitly label all appropriate
variables as either shared or private.
Make an argument for the correctness of your implementation (it is
acceptable to use the same argument as problem 1, provided it is still
applicable).
The program should take an additional command-line argument: the
number of threads to use in parallel sections.
It is only required that you parallelize the main portion of the
simulation, but parallelizing the initialization phase of Ocean is
also worthwhile. You will not be penalized if you choose not to
parallelize the initialization phase.
For simplicity, you may assume that the dimensions of the grid are
powers of two plus two as before, and that
only N=[1,2,4,8,16,32] will be passed as the number of threads.
Problem 3: Write Parallel Ocean using MPI (10 points)
For this problem, you will use MPI primitives to parallelize your sequential program.
Note that the processes in message-passing programs do not share
a single address space. So you will have to explicitly split your data structures between
the various processes. Again, make an argument for the correctness of your implementation
For simplicity, you may assume that the dimensions of the grid are
powers of two plus two, and that only N=[1,2,4,8,16,32] will be passed as the
number of threads.
Problem 4: Analysis of Ocean (10 points)
Modify your programs to measure the execution time of the
parallel phase of execution. Use of Unix's gethrtime() is
recommended for OpenMP. Use MPI_Wtime() for timing your MPI program.
Compare the performance of your three Ocean
implementations for a fixed number of time steps (100).
Plot the normalized (versus the Sequential version of Ocean) speedups of
your implementations on N=[1,2,4,8,16,32] threads for a 514x514 ocean.
Note that the N=1 case should be the Sequential
version of Ocean, not the parallel version using only 1 thread.
Repeat for an ocean sized to 1026x1026.
What to Hand In
This assignment will be peer reviewed. Please bring two copies of your homework to lecture; you will give these to the two NEWLY ASSIGNED peer review members.
Your answers to the discussion questions should be typed up, handwritten
notes are not acceptable.
A tarball of your entire source code including a Makefile and a README file,
should be emailed to your peer group members before the beginning of lecture.
The README should include 1) directions for compiling your code, 2) directions for
running your code, 3) any other comments.
Use subject line [CS/ECE 757] Homework 1,
so that email filters work properly.
- A printout of the source code for the simulation phase of Sequential Ocean
(this is probably just a for loop).
- A printout of the source code for the parallel phase of OpenMP Ocean
(this is the code that is parallelized using schedule(dynamic)) and a concise description
of how you parallelized the program.
- Arguments for correctness of the two programs.
- The plots as described in Problem 3 including a detailed explanation of the observed trends.
Specifically, explain 1) differences between the slopes for the two grid sizes, 2) any changes in the
slope for each of the grid sizes separately, 3) sources of superlinear speedups, 4) sources of sub-linear
speedup.
Tips and Tricks
- Start early. The machine will get very heavily loaded the two days before the assignment is due.
- Set up RSA authentication on malbec to save yourself some keystrokes.
HowTo.
- Make use of the demo programs provided.
- You can use /usr/platform/sun4v/sbin/prtdiag -v on malbec to learn
many useful characteristics of your host machine.
- Run your programs multiple times to get accurate time measurements. This will help
avoid incorrect results due to interference with other user's programs.
|