--------------------------------------------------------------------
CS 757 Parallel Computer Architecture
Spring 2012 Section 1
Instructor Mark D. Hill
--------------------------------------------------------------------

Outline
* PRAM
* Models
* (Dark Silicon)


--------------------

PRAM

Sequential
 Time to sum n numbers? O(n)
 Time to sort n numbers? O(n log n)
 What model? RAM

Parallel
 Time to sum? Tree for O(log n)
 Time to sort? Non-trivially O(log n)
 What model?
 PRAM [Fortune Willie STOC78]
 P processors in lock-step
 One memory (e.g., CREW foroncurrent read exclusive write)


 <Show picture>

Why not realistic?

 Asychrony means synchronization needed

 Latencies grow as the system size grows

 Bandwidths are restricted by memory organizations and interconnection networks

 Dealing with reality leads to division between
    UMA: Uniform Memory Access
    and 
    NUMA: Non-Uniform Memory Access

How build?

  Show:

  P  P  P     P  P  P     P--P--P
  |  |  |     |  |  |     |  |  |
  M  M  M     M--M--M     M--M--M
  |  |  |     |  |  |     |  |  |
  N--N--N     N--N--N     N--N--N  

Shared I/O   Shared Memory All Shared (SIMD?)

Much of this course -- shared memory
Later -- clusters for shared I/O
SIMD

Flynn's Taxonomy
    SISD
    SIMD
    MIMD
    (MISD)


(stopped here)

Definitions

   Thread -- PC
   Process -- address space

Basic Programming Models (draw pictures)
  Multitasking:		n x ( 1 thread/process) w/o communication
  Message-Passing 	n x ( 1 thread/process) w/ messages
  Shared Memory:	n threads in 1 process (how communcate?)
  Shared Memory': 	n x ( 1 thread/process) w/ "System V" shared memory
  Sequential: 		1 thread/process with parallelizing software
  Data Parallel         1 thread/process with data parallel ops
  (generalization of SIMD)