UW-Madison
Computer Sciences Dept.

CS 758 Advanced Topics in Computer Architecture

Programming Current and Future Multicore Processors

Fall 2010 Section 1
Instructor David A. Wood and T. A. Derek Hower
URL: http://www.cs.wisc.edu/~david/courses/cs758/Fall2010/

Homework 6 // Due at Lecture Wednesday, November 3, 2010

You will perform this assignment on two architectures, a SPARC Niagara 2 (gamay.cs.wisc.edu) and and Intel Nehalem (ale-01.cs.wisc.edu).

You should do this assignment alone. No late assignments.

Purpose

The purpose of this assignment is to think about a non-threaded approach to concurrency, namely MapReduce, and how we can use it to parallelize a familiar application.

Programming Environment: MapReduce

MapReduce is an environment pioneered by Google to facilitate parallism in searches. While not originally intended for shared-memory systems, implementations exist that run on commodity CMPs. A tutorial from a previous year can be found here. The Phoenix release you will be using also comes with documentation and examples.

In this assignment you will be using version 2.0 of the Phoenix mapreduce implementation. Visit the website for source code and documentation.

Programming Task: N-Bodies (Again!)

In this assignment you will again be writing two implementations of the N-body program. Version 1 will be using an n^2 algorithm, and the second version will use the n log n algorithm. See assignment #3 for more details on N-Bodies.

Problem 1: Pairwise N-Body MapReduce

You are you implement an N^2 version of parallel N-Bodies. You may reuse any existing code from assignment #3. Using MapReduce will demand a new approach to finding the parallelism in N-Bodies. You are encouraged to think carefully before implementing your algorithm. You are free to adopt any strategy you wish for this assignment.

The MapReduce scheduler accepts quite a few tuning parameters -- be sure to experiment with these parameters to achieve optimal performance for your implementation.

Problem 2: Logarithmic N-Body MapReduce

Find a way to implement parallel N-bodies in an n log n algorithm using MapReduce. This may take some thought, so begin early.

Problem 3: Evaluation

Evaluate your code on gamay and ale. Show speedups on ale for N=[1,2,4,8,16] and on gamay for N=[1,2,4,8,16,32,64]. You are not required to bind threads, though you may choose to do so. You can also choose your own input for this assignment. Choose the smallest body count and timestep that is able to show the interesting parts of your program's execution characterstics.

Also compare your MapReduce implementation to the TBB and OpenMP implementations you created in assignment #3. You do not need to include gamay runs in this comparison.

Problem 4: Questions (Submission Credit)

  1. Describe your parallelization strategies for both the N^2 and n log n implementations. Did your strategy include considerations for data locality, communication, etc?
  2. Is MapReduce a good model for the N-bodies problem? Why or why not? Did you prefer MapReduce, TBB, or OpenMP?
  3. What performance tuning parameters did you pass to the MapReduce runtime?
  4. Describe any interesting points in your evaluation graph.
  5. Which performed best, MapReduce, TBB, or OpenMP?

Tips and Tricks

Start early.

Do the MapReduce Tutorial.

Make a plan before you start writing code.

What to Hand In

Please turn this homework in on paper at the beginning of lecture.

A printout of your MapReduce-specific portions of code. This must include the map function, and may optionally include splitter, reduce, partition and any other non-trivial code. Please do not include code that is trivial.

Evaluation graphs.

Answers to questions in Problem 3.

Important: Include your name on EVERY page.

 
Computer Sciences | UW Home