UW-Madison
Computer Sciences Dept.

CS 758 Advanced Topics in Computer Architecture

Programming Current and Future Multicore Processors

Fall 2010 Section 1
Instructor David A. Wood and T. A. Derek Hower
URL: http://www.cs.wisc.edu/~david/courses/cs758/Fall2010/

Homework 5 // Due at Lecture Monday, October 25

You will perform this assignment on the x86-64 Nehalem-based systems you used previous homeworks: ale-01.cs.wisc.edu and ale-02.cs.wisc.edu.

You should do this assignment alone. No late assignments.

Purpose

The purpose of this assignment is to give you some experience converting lock-based synchronization into transactions, using Intel's prototype C++ STM Compiler.

Programming Environment: POSIX Threads + Software Treansactional Memory (STM)

In this assignment you will be using the POSIX threads (pthreads) environment that you know and love combined with a software transactional memory infrastructure

You will be using Intel's prototype C++ STM compiler. The programming interface for Intel's STM is described in this document.

Programming Task: Work Stealing Task Queue

You are to design three implementations of a work stealing task queue (see Cilk paper for a refresher). The first implementation will use coarse grained locks (i.e., a single global lock). The second will attempt to get more concurrency by using fine grained locking. Finally, you will use the STM to create a task queue that utilizes transactional memory.

The task queue you design will be used inside the TaskMan runtime library. TaskMan is a Cilk-like programming environment that supports task-based parallelism in C++. Unlike Cilk, TaksMan utilizes futures to eliminate the need to call sync() after spawning a task. Instead, the TaskMan library automatically ensures that the result of a task will be ready the first time it is read. More information on TaskMan can be found in this report. The code you will be using is in a downloadable tarball here.

Your specific assignment is create three new subclasses of the AbstractQueue class inside of the task library. When your implementation is correct, the programs in the test directory should run to completion without error.

Problem 1: Compiler Setup

Download and install the Intel C++ STM Compiler. (You probably want to store the tarball in /scratch or /tmp and install to AFS from there.) At the first prompt, you should choose "Install as current user to limit access to user level", rather than the default, and specify somewhere in your CS account to install the compiler. To save storage space, you may want to choose a custom installation. You will not need the Math Kernel Library, TBB, or the IPP library.

When you are prompted for a license file, specify the following location:
/s/intel_cc-11.0/common/licenses/NCOM_L_CMP_CPP_NRGF-WJJ5F3HH.lic
If anyone has trouble getting the license to work, please contact the instructor.

Next, modify the Makefile of the task library to use the Intel C++ STM compiler, which should be located at:
<install_dir>/intel/Compiler/11.0/606/bin/intel64/icpc
Note that to use TM, you will need to specify the -Qtm_enabled flag for both the compile and link steps.

To execute the resulting program, you will also need to set the LD_LIBRARY_PATH environment variable to point to:
<install_dir>/intel/Compiler/11.0/606/lib/intel64

NOTE: Students have reported issues installing the compiler on the ale machines, and possibly other machines. Students have reported that installations performed on the clover machines have worked correctly. Please contact the TA if you have an issue with an installation performed on a clover node.

Problem 2: Three Queue Implementations

The goal of problem 2 is to make three versions of a work stealing task queue, one using coarse grained locks, one using fine grained locks, and one that uses transactions instead of locks to provide atomic operations.

As described in the Intel documentation, use atomic blocks to synchronize the implementations of Lookup(), Set(), and Remove(). This is done using the __tm_atomic construct, as well as annotating any method or function called within a transaction with __attribute__((tm_callable)). You may find other useful constructs in the documentation. Feel free to experiment.

You can get statistics on your transactions by setting the environment variable ITM_STATISTICS to "simple" or "verbose" (remove quotes). The statistics will be written out to a file called itm.log.

Problem 3: Description of Synchronization Strategies

Describe where and how you used transactions to synchronize your code. Describe where you felt transactions weren't appropriate.

Problem 4: Evaluation

Evaluate the performance of your three implementations using the test programs in the distributed tarball. You should use the following input sizes:
Input
fib50 (numthreads)
heat-benchmark long -nproc (numthreads)
matmul-n 1000 -nproc (numthreads)
plu-n 4096 -nproc (numthreads)

You should run evaluation experiments for each queue implementation.

Also, for the transactional queue, set the ITM_STATISTICS variable to verbose. Include a print-out of the "GRAND TOTAL" section of each with your report.

Problem 5: Questions (Submission Credit)

  1. Is transactional memory the best thing since sliced bread?
  2. What did you observe in the transactional statistics? What sort of differences did you see between the fine-grained parallelism, and the more coarse-grained parallelism of the transactional tests?
  3. Describe any restructuring you did to the program to improve performance.
  4. Which implementation -- lock-based or transaction-based -- performed best?
  5. Which was easier overall -- writing the transactional version, or writing the fine-grained locking version? Include estimates of coding and debugging time.

Tips and Tricks

Start early.

What to Hand In

Please turn this homework in on paper at the beginning of lecture. Please STAPLE the pages together (paper clips are better than nothing, but staples are preferred).

Your code from the queue implementations.

Your results from Problem 4.

Answers to questions in Problem 5.

Important: Include your name on EVERY page.

 
Computer Sciences | UW Home