UW-Madison
Computer Sciences Dept.

CS 758 Advanced Topics in Computer Architecture

Programming Current and Future Multicore Processors

Fall 2009 Section 1
Instructor David A. Wood and T. A. Matthew D. Allen
URL: http://www.cs.wisc.edu/~david/courses/cs758/Fall2009/

Homework 6 // Due at Lecture Wednesday, October 28, 2009

You will perform this assignment on the x86-64 Nehalem-based systems you used previous homeworks: ale-01.cs.wisc.edu and ale-02.cs.wisc.edu.

You should do this assignment alone. No late assignments.

Purpose

The purpose of this assignment is to give you some experience converting lock-based synchronization into transactions, using Intel's prototype C++ STM Compiler.

Programming Environment: POSIX Threads + Software Treansactional Memory (STM)

Once again, threads in this homework are of the POSIX flavor.

As in HW2, the orchestration and creation/destruction of threads has been done for you, as you will be re-using most of your code from HW2.

You will be modifying your concurrent tree code to use transactions to perform the synchronization. You will be using Intel's prototype C++ STM compiler. The programming interface for Intel's STM is described in this document.

Programming Task: Concurrent Binary Tree, Reloaded

This homework re-uses your lock-based implementation of a concurrent binary tree from HW2. You will modify this code to use STM for synchronization instead of locks for many cases.

The previous code had several bugs which have been corrected. An updated version of the code is available here.

Problem 1: Compiler Setup

Download and install the Intel C++ STM Compiler. (You probably want to store the tarball in /scratch or /tmp and install to AFS from there.) At the first prompt, you should choose "Install as current user to limit access to user level", rather than the default, and specify somewhere in your CS account to install the compiler. To save storage space, you may want to choose a custom installation. You will not need the Math Kernel Library, TBB, or the IPP library.

When you are prompted for a license file, specify the following location:
/s/intel_cc-11.0/common/licenses/NCOM_L_CMP_CPP_NRGF-WJJ5F3HH.lic
If anyone has trouble getting the license to work, please contact the TA.

Next, modify the Makefile of the ctree benchmark to use the Intel C++ STM compiler, which should be located at:
<install_dir>/intel/Compiler/11.0/606/bin/intel64/icpc
Note that to use TM, you will need to specify the -Qtm_enabled flag for both the compile and link steps.

To execute the resulting program, you will also need to set the LD_LIBRARY_PATH environment variable to point to:
<install_dir>/intel/Compiler/11.0/606/lib/intel64

NOTE: Students have reported issues installing the compiler on the ale machines, and possibly other machines. Students have reported that installations performed on the clover machines have worked correctly. Please contact the TA if you have an issue with an installation performed on a clover node.

Problem 2: Transactionalize your Concurrent Tree Operations

The goal of problem 2 is to make a version of the concurrent tree microbenchmark that uses transactions instead of locks to provide atomic tree operations.

As described in the Intel documentation, use atomic blocks to synchronize the implementations of Lookup(), Set(), and Remove(). This is done using the __tm_atomic construct, as well as annotating any method or function called within a transaction with __attribute__((tm_callable)). You may find other useful constructs in the documentation. Feel free to experiment.

You can get statistics on your transactions by setting the environment variable ITM_STATISTICS to "simple" or "verbose" (remove quotes). The statistics will be written out to a file called itm.log.

For this step, you should disable the transactional and throughput tests by commenting out the defines at the top of main.C. When your implementation passes the parallel tests, proceed to the next problem.

Problem 3: Use STM for Concurrent Tree Transactions

The next step is to synchronize the transactional calls to the concurrent tree. You will do this by modifying the Transactions.C file to use atomic blocks and your implementations of Lookup, Set, and Remove (not the TransactionalLookup, TransactionalSet, and TransactionalRemove). Using atomic blocks will greatly simplify this code. For example, you can remove the logging and undo functionality, since this is provided by the STM.

You should leave the calls to usleep exactly as they are for the throughput tests. The compiler complains about them, but you can indicate that they do not effect the transaction by using the __tm_waiver annotation (this was your TAs best-guess approach to this problem, but if you think of something better, let me know).

Problem 4: Description of Synchronization Strategies

Describe where you used transactions to synchronize your code. Describe where you felt transactions weren't appropriate.

Problem 5: Evaluation

Evaluate the throughput of both the transactional and torture tests on your implementation, and compare them with the single lock implementation and, if you got it working, your program from HW#2.

Also, set the ITM_STATISTICS variable to verbose and collect the transaction statistics for the parallel test, and for the transactional tests separately. Include a print-out of the "GRAND TOTAL" section of each with your report.

Problem 6: Questions (Submission Credit)

  1. Is transactional memory the best thing since sliced bread?
  2. Describe how you used atomic blocks and the purpose of each.
  3. What did you observe in the transactional statistics? What sort of differences did you see between the fine-grained parallelism of the parallel tree operations (parallel tests), and the more coarse-grained parallelism of the transactional tests?
  4. Describe any restructuring you did to the program to improve performance.
  5. Which implementation -- lock-based or transaction-based -- performed best?
  6. Which was easier overall -- writing the transactional version, or writing the fine-grained locking version? Include estimates of coding and debugging time.

Tips and Tricks

Start early.

What to Hand In

Please turn this homework in on paper at the beginning of lecture. Please STAPLE the pages together (paper clips are better than nothing, but staples are preferred).

Your code from CTree.[hC] and Transactions.C, as well as any other modifications you make.

Your results from Problem 5.

Answers to questions in Problem 6.

Important: Include your name on EVERY page.

 
Computer Sciences | UW Home