Homework 5 // Due at Lecture Monday, October 25
You will perform this assignment on the x86-64 Nehalem-based systems you used previous homeworks:
ale-01.cs.wisc.edu and ale-02.cs.wisc.edu.
You should do this assignment alone. No late assignments.
Purpose
The purpose of this assignment is to give you some experience
converting lock-based synchronization into transactions, using
Intel's prototype C++ STM Compiler.
Programming Environment: POSIX Threads + Software
Treansactional Memory (STM)
In this assignment you will be using the POSIX threads (pthreads)
environment that you know and love combined with a software
transactional memory infrastructure
You will be using
Intel's prototype C++ STM compiler. The programming interface for
Intel's STM is described in this
document.
Programming Task: Work Stealing Task Queue
You are to design three implementations of a work stealing task queue
(see Cilk paper for a refresher). The first implementation will use
coarse grained locks (i.e., a single global lock). The second will
attempt to get more concurrency by using fine grained
locking. Finally, you will use the STM to create a task queue that
utilizes transactional memory.
The task queue you design will be used inside the TaskMan runtime
library. TaskMan is a Cilk-like programming environment that supports
task-based parallelism in C++. Unlike Cilk, TaksMan utilizes futures
to eliminate the need to call sync() after spawning a task. Instead,
the TaskMan library automatically ensures that the result of a task
will be ready the first time it is read. More information on TaskMan
can be found in this report. The code you
will be using is in a downloadable
tarball here.
Your specific assignment is create three new subclasses of the
AbstractQueue class inside of the task library. When your
implementation is correct, the programs in the test directory should
run to completion without error.
Problem 1: Compiler Setup
Download and install the Intel
C++ STM Compiler. (You probably want to store the tarball in
/scratch or /tmp and install to AFS from there.) At the first prompt,
you should choose "Install as current user to limit access to user
level", rather than the default, and specify somewhere in your CS
account to install the compiler. To save storage space, you may want
to choose a custom installation. You will not need the Math Kernel
Library, TBB, or the IPP library.
When you are prompted for a license file, specify the following location:
/s/intel_cc-11.0/common/licenses/NCOM_L_CMP_CPP_NRGF-WJJ5F3HH.lic
If anyone has trouble getting the license to work, please contact the instructor.
Next, modify the Makefile of the task library to use the
Intel C++ STM compiler, which should be located at:
<install_dir>/intel/Compiler/11.0/606/bin/intel64/icpc
Note that to use TM, you will need to specify the -Qtm_enabled flag for both the compile and link steps.
To execute the resulting program, you will also need to set the LD_LIBRARY_PATH environment variable to point to:
<install_dir>/intel/Compiler/11.0/606/lib/intel64
NOTE: Students have reported issues installing the compiler on the ale machines, and possibly other machines. Students have reported that installations performed on the clover machines have worked correctly. Please contact the TA if you have an issue with an installation performed on a clover node.
Problem 2: Three Queue Implementations
The goal of problem 2 is to make three versions of a work stealing task queue, one using coarse grained locks, one using fine grained locks, and one that uses transactions instead of locks to provide atomic operations.
As described in the Intel
documentation, use atomic blocks to synchronize the
implementations of Lookup(), Set(), and Remove(). This is done using
the __tm_atomic construct, as well as
annotating any method or function called within a transaction with
__attribute__((tm_callable)). You may
find other useful constructs in the documentation. Feel free to
experiment.
You can get statistics on your transactions by setting the environment
variable ITM_STATISTICS to "simple" or "verbose" (remove quotes). The
statistics will be written out to a file called itm.log.
Problem 3: Description of Synchronization Strategies
Describe where and how you used transactions to synchronize your
code. Describe where you felt transactions weren't appropriate.
Problem 4: Evaluation
Evaluate the performance of your three implementations using the test programs in the distributed tarball. You should use the following input sizes:
| Input |
fib | 50 (numthreads) |
heat | -benchmark long -nproc (numthreads) |
matmul | -n 1000 -nproc (numthreads) |
plu | -n 4096 -nproc (numthreads) |
You should run evaluation experiments for each queue implementation.
Also, for the transactional queue, set the ITM_STATISTICS variable to
verbose. Include a print-out of the "GRAND TOTAL" section of each
with your report.
Problem 5: Questions (Submission Credit)
- Is transactional memory the best thing since sliced bread?
- What did you observe in the transactional statistics? What sort of differences did you see between the fine-grained parallelism, and the more coarse-grained parallelism of the transactional tests?
- Describe any restructuring you did to the program to improve performance.
- Which implementation -- lock-based or transaction-based -- performed best?
- Which was easier overall -- writing the transactional version, or writing the fine-grained locking version? Include
estimates of coding and debugging time.
Tips and Tricks
Start early.
What to Hand In
Please turn this homework in on paper at the beginning of
lecture. Please STAPLE the pages together (paper clips are
better than nothing, but staples are preferred).
Your code from the queue implementations.
Your results from Problem 4.
Answers to questions in Problem 5.
Important: Include your name on EVERY page.
|