-------------------------------------------------------------------- CS 758 Programming Multicore Processors Fall 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- ------------ Introduction ------------ Write Outline on Board * PRAM * Class Structure * Models * Sutter/Laris Next Lecture Architecture & 757 Review Niagara (new questions) ------------------------------------ PRAM Sequential Time to sum n numbers? O(n) Time to sort n numbers? O(n log n) What model? RAM Parallel Time to sum? Tree for O(log n) Time to sort? Non-trivially O(log n) What model? PRAM [Fortune Willie STOC78] P processors in lock-step One memory (e.g., CREW foroncurrent read exclusive write) Why not realistic? Asychrony means synchronization needed Latencies grow as the system size grows Bandwidths are restricted by memory organizations and interconnection networks Dealing with reality leads to division between UMA: Uniform Memory Access and NUMA: Non-Uniform Memory Access How build? ------------------------------------ Class Structure ------------------------------------ How build? Show: P P P P P P P--P--P | | | | | | | | | M M M M--M--M M--M--M | | | | | | | | | N--N--N N--N--N N--N--N Shared I/O Shared Memory All Shared (SIMD?) Much of this course -- shared memory Later -- clusters for shared I/O SIMD Flynn's Taxonomy SISD SIMD MIMD (MISD) Definitions Thread -- PC Process -- address space Basic Programming Models (draw pictures) Multitasking: n x ( 1 thread/process) w/o communication Message-Passing n x ( 1 thread/process) w/ messages Shared Memory: n threads in 1 process (how communcate?) Shared Memory': n x ( 1 thread/process) w/ "System V" shared memory Sequential: 1 thread/process with parallelizing software Data Parallel 1 thread/process with data parallel ops (generalization of SIMD) or n threads in lock-step w/ shared memory GPU SIMT (Single Instruction Multiple Threads) to first-order n threads in lock-step w/ shared memory best performance: program as data parallel but threads can diverge -- dividing performance ------------------------------ Software and the Concurrency Revolution Herb Sutter & Jim Larus ACM Queue 09/2005 Concurrency more disruptive than OO * sole path to high performance * concurrent programming hard (e.g., can't look at just one context Where concurrency? * Easy to FIND on servers or cloud (if small requests mediated by shared store) (but still a challenge to exploit) * hard to even find on client (Modern apps split accross both) High-order issues * granularity of operations -- from single instructions to large executions * degree of coupling -- communicationa and synchronization -- small to embarrously parallel Types of parallelism * Independent matrix A = matrix A * 2 * Regular A[i,j] = avg of neighbors * Unstructured How coordinate? * Locks are the default but not composable (mus peek into implementation of abstraction) subtle deadlock optional conventions * Synchronized methods -- too strong and too weak * Lock-free programming -- too hard * TM -- not here yet What from PLs? * Automatic parallelism would be nice * functional programming to rescue -- probably not due ot side-effects * high-level abstraction (revealed for functional programming) promising (e.g., map-reduce) * Also futures & active objects