-------------------------------------------------------------------- CS 757 Parallel Computer Architecture Spring 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- Outline * PRAM * Models * (Dark Silicon) -------------------- PRAM Sequential Time to sum n numbers? O(n) Time to sort n numbers? O(n log n) What model? RAM Parallel Time to sum? Tree for O(log n) Time to sort? Non-trivially O(log n) What model? PRAM [Fortune Willie STOC78] P processors in lock-step One memory (e.g., CREW foroncurrent read exclusive write) Why not realistic? Asychrony means synchronization needed Latencies grow as the system size grows Bandwidths are restricted by memory organizations and interconnection networks Dealing with reality leads to division between UMA: Uniform Memory Access and NUMA: Non-Uniform Memory Access How build? Show: P P P P P P P--P--P | | | | | | | | | M M M M--M--M M--M--M | | | | | | | | | N--N--N N--N--N N--N--N Shared I/O Shared Memory All Shared (SIMD?) Much of this course -- shared memory Later -- clusters for shared I/O SIMD Flynn's Taxonomy SISD SIMD MIMD (MISD) (stopped here) Definitions Thread -- PC Process -- address space Basic Programming Models (draw pictures) Multitasking: n x ( 1 thread/process) w/o communication Message-Passing n x ( 1 thread/process) w/ messages Shared Memory: n threads in 1 process (how communcate?) Shared Memory': n x ( 1 thread/process) w/ "System V" shared memory Sequential: 1 thread/process with parallelizing software Data Parallel 1 thread/process with data parallel ops (generalization of SIMD)