-------------------------------------------------------------------- CS 757 Parallel Computer Architecture Spring 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- ------------ Consistency I ------------ Outline System Model & Coherence Memory Consistency & SC (TSO forecast) ------------------------------ System Model & Coherence System model -- Figure 2.1 cores w/ private caches w/ controllers icn LLC w/ memory w/ controller Offchip DRAM Not: * Not multisocket * Not hierarchical caches EZ: * Private L2 * Banked LLC * Multiple DRAM channels Incoherence example 2.2 Coherence Invariants 1. Single-Writer, Multiple-Read (SWMR) Invariant. For any memory location A, at any given (logical) time, there exists only a single core that may write to A (and can also read it) or some number of cores that may only read A. 2. Data-Value Invariant. The value of the memory location at the start of an epoch is the same as the value of the memory location at the end of its last read–write epoch. Show Figure 2.3 timeline read-write at 1 read-only at 2, 3 read-write at 2 .. Maintiain invariants * Use (64B blocks) * FSM at caches & LLC * communication with message/bus Goal: * Make cache is invisible as in uniprocessors * Once invisible, what does memory do? (can't refer to caches) Memory Consistency ------------------- Problem SC Formalize SC Implement SC TSO Formalize TSO Reviews * Aditya: Simple examples * Eric: SC strict? * Syed: read-modify-write & order? * Marc: safety net? * Andrew N: Who uses relaxed? * David: Myth v. reality For next time: * Brian, Daniel: Performance differences? * Asim: SC & optimizations * Guoliang: What's best? Coherence vs. Consistency * Coherence concerns only one memory location * Consistency concerns apparent ordering for all locations Example /* initial A = B = flag = 0 */ P1 P2 A = 1; while (flag == 0); /* spin */ B = 1; print A; flag = 1; print B; Intuition says printed A = B = 1 Coherence doesn't say anything, why? Consider coalescing write buffer /* initial A = B = 0 */ P1 P2 A = 1; B = 1 r1 = B; r2 = A; There outcomes but not fourth. How screw up? * write buffer * ooo loads * dir protocol that doesn't wait to ack Define Memory Consistecny * What SW can expect of HW * What HW must provide to SW Want 4Ps: (DID NOT COVER) * Programmability: EZ to program * Performance: faciliates good performance (or low cost) * Portable * Precision Sequential Consistency A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. Show two processors po and global mo Show "railroad switch" picture Same as multi-threaded uniprocessor Formalize SC (NOT DONE IN 2012) * Program order

L(a) L(a) S(a) S(a)