-------------------------------------------------------------------- CS 758 Programming Multicore Processors Fall 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- Outline * OpenMP (see 838 notes?) * Amdahl's Law, etc. * Dividing Up Work -------------------------------- OpenMP Simple Use (see documentation for richer use): 1. Write sequential program (with eye to parallelization) 2. Add OMP directives to parallelize key loops Example from addmatrix.c: // does A = B + C (write without pragma first) // handle the outer loop in parallel #pragma omp parallel for \ shared(A,B,C,xdim,ydim) \ private(i,j) \ schedule(dynamic) for(j=0;j Costup(C) 1995 SGI PowerChallenge w/ 500MB: Costup(32) = 8.6 Multicores haveven lower Costups!!! ------------------------------------------ Dividing Up Work **** DID NOT "WORK" -- TOO ABSTRACT ***** * Can think of dividing up monolitic execution or * Do a fine-grain breakup and then re-join things Let's try the latter.... Use Ocean as a running example? Think of dynamic program execution as a graph nodes -- "chunk" of computational work edges -- dependences Usually: * some nodes do "different" work on "the same" data * some nodes do "the same" work on "different" data Putting together: nodes with "different" work on "the same" data ==> functional, pipelined parallelism nodes with "the same" work on "different" data ==> data parallelism Usually more data parallelism, especially as data gets larger. Also overheads will be mostly between nodes NOT put together Thus, want to join nodes that would otherwise have a lot of overhead Overheads: Dependences Synchronization Communication If graph structure and node computation time predictable THEN statically group (a.k.a. statically partition). ELSE dynamically group (a.k.a. dynamically partition) Consider work queue for dynamically partitioning work * small vs. large elements * centralize vs. distributed