Homework 3 // Due at Lecture Tuesday, October 2, 2007
You will perform this assignment on the x86-64 Clovertown-based systems you used in Homework 2:
clover-01.cs.wisc.edu and clover-02.cs.wisc.edu.
You should do this assignment alone. No late assignments.
Purpose
The purpose of this assignment is to further explore the features of Intel's (R) Thread Building Blocks (TBB) multithreading package,
including its task managment, synchronization, and (optionally) concurrent data structures.
Programming Environment: TBB
Intel's Thread Building Blocks (TBB) package
provides a host of useful services to the parallel programmer, including some of the same
loop parallelization options provided by OpenMP (with different syntax, of course). Intel provides
a handy Getting Started Guide that is available at the link above under the Documentation
tab, which will show you everything you need to know about TBB for the purposes of this assignment. You will find the
Tutorial document very useful for this assignment.
A tutorial on TBB's loop parallelization is available
here.
It will guide you through TBB setup and a brief illustrative example.
Programming Task: Othello AI
Othello (aka, Reversi) is a strategy board game, played on an
8x8 board. Othello is a game for two players; moves consist of placing tokens on the board to flip your opponent's tokens,
subject to placement rules. You are not required to learn the rules to Othello, but it is probably a good idea nonetheless,
since you will be parallelizing an Othello AI.
In this assignment, you are given a serial implementation of a recursive lookahead-based Othello AI (You can download the code
here). Your task is to parallelize and improve the code
in a variety of ways. You will do so using the features of Intel's Thread Building Blocks package.
The serial version you are given starts with the current Othello board position, enumerates all possible moves, and then
recusively evaluates all potential move combinations, up to the desired lookahead, aka the depth of the search. The serial
version is not very intelligent in its approach to Othello AI. A complete evaluation of the search space is not necessary, and the
fitness criterion used by the implementation is very basic. You can improve on these shortcomings if it suits your interests. In
general, you may modify any part of the provided code.
The AI plays the game against itself, and uses a greedy algorithm to select the moves for the white player, and only uses
its depth-search capability to discover the black player's moves. This is done to limit overall runtime, but you may
easily change this if you prefer to use the depth-search algorithm for both players.
Problem 1: Parallel Othello AI using parallel_while
The provided code has plenty of candidate loops that could be parallelized. To fulfill the requirements of this problem, you
must parallelize one or more of these loops using TBB's parallel_while construct, and any other
elements of the TBB package that you may desire (parallel_while is the only requirement).
Your choice of which loop(s) to parallelize will determine the difficulty of this problem, as well as your potential for speedup.
You are encouraged to experiment with many different options -- once you have learned the parallel_while
syntax, it is relatively easy to try several different loops. You will be expected to explain your choise of loop(s) in Problem 4.
Problem 2: Parallel Othello AI using Task Recursion
Everything in TBB is a task -- up to now, we have used TBB's loop parallelization capability to implicitly form tasks to
run in parallel. Now, instead of parallelizing a loop, we will parallelize the recursive operation of the original algorithm.
Starting again with the serial code, parallelize the recursive search algorithm using TBB's spawnable tasks (similar to
the Fibonacci example of section 10.2 of Intel's Tutorial document). To fulfill the requirements of this problem, your
revised code must include at least one C++ class that inherits from class task
in the TBB package (and, naturally, this task must be called recursively, in parallel).
Optional: Explore the multitude of tweaking options available in TBB -- continuations, scheduler bypass, and task recycling,
for instance. You might also want to try some other TBB features along the way (concurrent data structures, locks of various flavors,
atomic operations, scalable allocators, etc.) -- this will be the last TBB assignment.
Problem 3: Evaluation
Evaluate your code on the Clovertown platform. Provide a table showing total execution time, in seconds, for the original
algorithm, your parallel_while parallelization from Problem 1, and your recursive task
parallelization from Problem 2 as rows of the table. Include columns for lookahead = [3,4,5,6,7]. For example (the numbers
below are 100% ficticious). Note on the graph the number of threads used to attain the best performance (or if you used the
TBB default value). This value need not be constant across all implementations (e.g. N=6 for Problem 1 and N=8 for Problem 2
is acceptible).
Implementation | Lookahead |
| 3 | 4 | 5 | 6 | 7 |
Serial | 1s | 2s | 5s | 60s | 10000s |
PWhile | 1s | 1s | 2s | 10s | 100s |
PTask | 1s | 1s | 2s | 10s | 100s |
Problem 4: Questions (Submission Credit)
-
Which loop(s) did you select for parallelization in Problem 1? Why did you choose that/those loop(s)? Why were other loops unsuitable?
Use specific examples.
-
Which implementation (parallel_while or Task-based Recursion) performs better for lookaheads 6
and 7? Which implementation did you prefer?
-
How many total board positions are examined in your implemenatations for Lookahead = 6? Lookahead = 7? What is the
rate at which boards are examined in each of the above implementations? You may calculate or measure
the value.
Tips and Tricks
Start early.
Read Intel's TBB Tutorial (Select Documentation tab).
Don't forget to add -ltbb and other useful switches to the provided Makefile.
Don't forget to source TBB's environment variables!
What to Hand In
Please turn this homework in on paper at the beginning of lecture.
A printout of your parallel_while code from Problem 1, annotated to indicate which
loop from the original program you parallelized.
A printout of your task class and stream class from Problem 2, annotated to indicate your strategy for task
recursion.
If you improved the AI in any way, a brief description of the improvements (keep your code!).
The table from Problem 3.
Answers to questions in Problem 4.
Important: Include your name on EVERY page.
|