Homework 4 // Due at Lecture Monday, October 12, 2009
You will perform this assignment on the x86-64 Nehalem-based systems you used in Homework 3:
ale-01.cs.wisc.edu and ale-02.cs.wisc.edu.
You should do this assignment alone. No late assignments.
Purpose
The purpose of this assignment is to further explore the features of
Intel's (R) Thread Building Blocks (TBB) multithreading package,
including its task managment, synchronization, and (optionally)
concurrent data structures.
Programming Environment: TBB
Intel's
Thread Building Blocks (TBB) package provides a host of useful
services to the parallel programmer, including some of the same loop
parallelization options provided by OpenMP (with different syntax, of
course). Intel provides a handy Getting Started Guide that is
available at the link above under the Documentation tab, which
will show you everything you need to know about TBB for the purposes
of this assignment. You will find the
Tutorial document very useful for this assignment.
Programming Task: Othello AI
Othello (aka, Reversi) is a strategy board game, played on an
8x8 board. Othello is a game for two players; moves consist of placing tokens on the board to flip your opponent's tokens,
subject to placement rules. You are not required to learn the rules to Othello, but it is probably a good idea nonetheless,
since you will be parallelizing an Othello AI.
In this assignment, you are given a serial implementation of a
recursive lookahead-based Othello AI (You can download the code here). Your
task is to parallelize and improve the code in a variety of ways. You
will do so using the features of Intel's Thread Building Blocks
package.
The serial version you are given starts with the current
Othello board position, enumerates all possible moves, and then
recusively evaluates all potential move combinations, up to the
desired lookahead, aka the depth of the search. The serial
version is not very intelligent in its approach to Othello AI. A
complete evaluation of the search space is not necessary, and the
fitness criterion used by the implementation is very basic. You can
improve on these shortcomings if it suits your interests. In general,
you may modify any part of the provided code.
The AI plays the game against itself, and uses a greedy algorithm to
select the moves for the white player, and only uses its depth-search
capability to discover the black player's moves. This is done to limit
overall runtime, but you may easily change this if you prefer to use
the depth-search algorithm for both players.
Problem 1: Parallel Othello AI using parallel_do
The provided code has plenty of candidate loops that could be
parallelized. To fulfill the requirements of this problem, you must
parallelize one or more of these loops using TBB's parallel_do construct, and any other elements
of the TBB package that you may desire (parallel_do is the only requirement).
Your choice of which loop(s) to parallelize will determine the
difficulty of this problem, as well as your potential for speedup.
You are encouraged to experiment with many different options -- once
you have learned the parallel_do
syntax, it is relatively easy to try several different loops. You will
be expected to explain your choise of loop(s) in Problem 4.
Problem 2: Parallel Othello AI using Task Recursion
Everything in TBB is a task -- up to now, we have used TBB's
loop parallelization capability to implicitly form tasks to run in
parallel. Now, instead of parallelizing a loop, we will parallelize
the recursive operation of the original algorithm.
Starting again with the serial code, parallelize the recursive search
algorithm using TBB's spawnable tasks (similar to the Fibonacci
example of section 10.2 of Intel's Tutorial document). To
fulfill the requirements of this problem, your revised code must
include at least one C++ class that
inherits from class task in the TBB
package (and, naturally, this task must be called recursively, in
parallel).
Optional: Explore the multitude of tweaking options available
in TBB -- continuations, scheduler bypass, and task recycling, for
instance. You might also want to try some other TBB features along the
way (concurrent data structures, locks of various flavors, atomic
operations, scalable allocators, etc.) -- this will be the last TBB
assignment.
Problem 3: Evaluation
Evaluate your code on the Nehalem (ale) platform. Provide a
table showing total execution time, in seconds, for the original
algorithm, your parallel_do
parallelization from Problem 1, and your recursive task
parallelization from Problem 2 as rows of the table. Include columns
for lookahead = [3,4,5,6,7]. For example (the numbers below are 100%
ficticious). Note on the graph the number of threads used to attain
the best performance (or if you used the TBB default value). This
value need not be constant across all implementations (e.g. N=6 for
Problem 1 and N=8 for Problem 2 is acceptible).
Implementation | Lookahead |
| 3 | 4 | 5 | 6 | 7 |
Serial | 1s | 2s | 5s | 60s | 10000s |
PDo | 1s | 1s | 2s | 10s | 100s |
PTask | 1s | 1s | 2s | 10s | 100s |
Problem 4: Questions (Submission Credit)
-
Which loop(s) did you select for parallelization in Problem 1? Why did you choose that/those loop(s)? Why were other loops unsuitable?
Use specific examples.
-
Which implementation (parallel_do or Task-based Recursion) performs better for lookaheads 6
and 7? Which implementation did you prefer?
-
How many total board positions are examined in your implemenatations for Lookahead = 6? Lookahead = 7? What is the
rate at which boards are examined in each of the above implementations? You may calculate or measure
the value.
Tips and Tricks
Start early.
Read Intel's TBB Tutorial (Select Documentation tab).
Don't forget to add -ltbb and other useful switches to the provided Makefile.
Don't forget to source TBB's environment variables!
What to Hand In
Please turn this homework in on paper at the beginning of lecture.
A printout of your parallel_do code from Problem 1, annotated to indicate which
loop from the original program you parallelized.
A printout of your task class and stream class from Problem 2, annotated to indicate your strategy for task
recursion.
If you improved the AI in any way, a brief description of the improvements (keep your code!).
The table from Problem 3.
Answers to questions in Problem 4.
Important: Include your name on EVERY page.
|