This paper discusses the challenges and advantages of implementing Google's Map Reduce on IBM Cell Architecture and the 
implementation is called as MR_on_Cell henceforth.

IBM Cell is powerful but is challenging to program and manage the local memories of the 8SPEs and 1 PPE and exploit the SIMD 
capability.Map Reduce (for large scale processing in DCE) has been previously ported for shared memory processing (Phoenix) 
and originally exists for distributed clusters as developed by Google.

The methodology used in Phoenix of hashed binary search tree for distribution of work in shared memory architecture is not 
viable in Cell as the software managed memory makes it difficult to recurse complex data structures. So, this model uses multi 
threading where PPE does the memory management schedules work for different SPEs which are running the user code
(parallelization).

This model also preallocates map and reduce output regions for bulk dma transfers.

This model also handles dynamic allocation of Map and Reduce function and frees the programmer of this burden.

The implementation is split in five stages : Map, partition, Quick sort, Merge Sort and Reduce. There are various optimization 
strategies implemented at each stage.

The paper further describes the implementation and benchmarking and analyzes both stages in detail. It also discusses the 
application types - map dominated (data parallel applications where reduce stage is just result coalescing) and partition/sort 
dominated (where partitioning is more intense than other stages).

The implementation is also naturally scalable.