-------------------------------------------------------------------- CS 757 Parallel Computer Architecture Spring 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- Outline * Shared Memory & Pthreads * Synchronization * MCS ---------------------------------- Shared Memory Operations "Normal" read, write, etc. * read's both local and for communication * Usually one or more atomic-read-modify-write primitives to ease synchronization (e.g., test-and-set) (more later) Ordering * Reads must return "last" write, but now last write can be from + same thread + different * Defined by "Memory Consistency Model" (discussed later) Pthreads stack --> per thread stack text data heap Go over eg_ptrhead.c (see Word file). ---------------------------------- Synchronization (See Synchronization from 757 SMP1 notes.) Primitives * test-and-set(L) atomic { tmp = L L = 1 return(tmp) } * compare-and-swap(new,old,W) atomic { if (old == W ) W = new return(true) else return(false) } * load-linked / store conditional load r_tmp, W ... store-conditional(r_tmp2,W) atomic { if "atomic" since load linked store r_tmp2, W return(success) else return(failure) } E.g., Fetch-and-add(W, offset) start: load-linked r_tmp, W r_tmp = r_tmp + offset store-conditional (r_tmp,W) if failure goto start Lock Goals * Fast when not held * Small state * Simple * Scales with many waiters (but this is back performance anyway) Must * Acquire * Wait * Release (and notify?) Simple Lock while (test-and-set(L)) {} /* spin if already locked */ Test&test-and-set repeat until priorstate==0 { while (L=0) {} /* spin reading */ priorstate = test-and-set(L) } Ticket Lock -- like a Bagels Forever -- buy FIFO Array Based Lock reduce spinning per physical lock fewer threads awoken on release more space (how much space?) MSC Lock instead of array, every waiter provides own record lock can be unheld if held points to tail of waiters join list, wait until waiter in front of you releases lock in your record release next waiter or set lock null many race conditions Barriers -- everyone waits until everyone arrives. Counter protected by lock & spin on flag (SMP1 slide 55) o Sense Reverse (SMP1 slide 56) Tournament (SMP1 slides 57) HW? Empty/Full bits Condition Variables Reviews: Tony: Best impleementation dependent on architecture. Add: size. Coherecne vs. dance hall. Daniel: Tree and tournament barriers Cong: FIFO vs. starvation in locks Reviews: * Later: coherence & ICNs * Brian: fetch-and-phi, whither modern processors * Syed: Dissemination barrier * Eric: Lock idea * Guoliang: sense-reverse barriers * Daniel: Critical path for barrier * Andrew N.: SW combining tree important? * Andrew E: HW?