--------------------------------------------------------------------
CS 758 Programming Multicore Processors 
Fall 2012 Section 1
Instructor Mark D. Hill
--------------------------------------------------------------------

------------
Synchronization, etc.
------------

OUTLINE
REDO
* Atomicity & Condition Synchronization
* Safety and Liveness
* Memory Consistency 

Michael Scott: Shared-Memory Synchronization
Synthesis Lecture
Chapter 1-3


Atomicity & Condition Synchronization

Atomicity

thread 1: 	thread 2:
ctr++ 		ctr++

thread 1: 	thread 2:
1: r := ctr 	1: r := ctr
2: inc r 	2: inc r
3: ctr := r 	3: ctr := r

(6
 3) = 20 interleaving

Many don't increment counter by 2.

lock L
thread 1: 	thread 2:
L.acquire() 	L.acquire()
ctr++ 		ctr++
L.release()	L.release()


What if many counters? (think: hash table or tree)

Coarse-grain locking

lock L
thread 1: 	thread 2:
L.acquire() 	L.acquire()
ctr[i]++ 	ctr[j]++
L.release()	L.release()

But L only needed if i==j.

Fine-grain locking

lock L[n]
thread 1: 	thread 2:
L[i].acquire() 	L[j].acquire()
ctr[i]++ 	ctr[j]++
L[i].release()	L[j].release()


More parallelism but deadlock

move(n, i, j):
L[i].acquire()
L[j].acquire() // (theres a bug here)
acct[i] -= n
acct[j] += n
L[i].release()
L[j].release()

thread 1: 	thread 2:
move(100,2,3)	move(50,3,2)

Spinning vs. Blocking

While (!condition) {} // do nothing


Condition Synchronization

Not any order, but some specific order

Q.remove(): 		Q.insert(d):
atomic 			atomic
await !Q.empty() 	await !Q.full()
// return data from next full slot // put d in next empty slot

Point-to-Point : flag
All-to-All : barrier 


Safety and Liveness

Safety means that bad things never happen.
E.g., we never have two threads in a critical section for the same lock at the same time;

Liveness means that good things eventually happen.
E.g., if lock L is free and at least one thread is waiting for it, some thread eventually acquires it.

For predicates P on reachable system states S,

Safety: FOR-ALL S [P(S)]
Liveness: FOR-ALL S[P(S) --> THERE-EXISTS[Q(T)]]


Safety (3.1)
------------

DEADLOCK FREEDOM

As noted in Section 1.4, deadlock freedom is a safety property: it requires four simultaneous conditions:

exclusive use  threads require access to some sort of non-sharable resources

hold and wait  threads wait for unavailable resources while continuing to hold resources they have already acquired

irrevocability  resources cannot be forcibly taken from threads that hold them

circularity  there exists a circular chain of threads in which each is holding a resource needed by the next


Most common approach -- break circularity condition and have static order of acquiring locks

Emerging, e.g., transactional memory -- break revocability condition


Liveness (3.2)

Liveness means that good things eventually happen.

A method is said to be wait free (the strongest variant of nonblocking progress) if it is guaranteed to complete in some bounded number of its own program steps.  (This bound
need not be statically known.) 

A method M is said to be lock free (a somewhat weaker variant) if SOME thread is guaranteed to make progress (complete an operation on the same object) in some bounded number of Ms program steps. 

A method is said to be obstruction free (the weakest variant of nonblocking progress) if it is guaranteed to complete in some bounded number of program steps if no other thread executes any steps during that same interval.


State-of-the-Art

Much theoretical work on guaranteed liveness and soem gurus applying in
practice.

On the other hand, many systems enginee liveness solutions that work in
practice -- e.g., exponential backoff.


------------------------------

Memory Consistency  (too much)
-------------------

Coherence's Goal:
* Make cache is invisible as in uniprocessors
* Once invisible, what does memory do?  (can't refer to caches)


Example

/* initial A = B = flag = 0 */
		  P1		    P2
		A = 1;		while (flag == 0); /* spin */
		B = 1; 		print A;
		flag = 1; 	print B;

Intuition says printed A = B = 1
(OMIT) Coherence doesn't say anything, why?
(OMIT) Consider coalescing write buffer


               /* initial A = B = 0 */
              P1                        P2
          A = 1;                  B = 1
          r1 = B;                 r2 = A;

There outcomes but not fourth.

(OMIT) How screw up?
* write buffer
* ooo loads
* dir protocol that doesn't wait to ack


Define Memory Consistecny
* What SW can expect of HW
* What HW must provide to SW

Want 4Ps: (DID NOT COVER)

* Programmability: EZ to program
* Performance: faciliates good performance (or low cost)
* Portable 
* Precision
								 

Sequential Consistency

A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

Show two processors po and global mo

Show "railroad switch" picture

Same as multi-threaded uniprocessor


Total Store Ordering (TSO) -- x86 and SPARC
------------------------------------------

               /* initial A = B = 0 */
              P1                        P2
          A = 1;                  B = 1
          r1 = B;                 r2 = A;

Allows r1 == r2 == 0?
(Why? write buffers)


Relaxed (Weak) Ordering -- ARM, IBM Power
-----------------------------------------


/* initially all 0 */
		  P1			    P2
		A = 1;			while (flag == 0); /* spin */
		B = 1; 			r1 = A;
		flag = 1; 		r2 = B;


But many (most) order NOT necessary! E.g., "A = 1" and "B = 1"


Why not just enforce necessary orders? 

Relaxed models
* Unordered by default
* Use FENCEs to get necesarry order

/* initially all 0 */
		  P1			    P2
		A = 1;			while (flag == 0); /* spin */
		                        FENCE
		B = 1; 			r1 = A;
		FENCE
		flag = 1; 		r2 = B;


---- STOPPED HERE

Sequential Consistency for Data-Race-Free (SC for DRF) Programs

Cake and Eat it too.


	FENCE
	lock(L)
	FENCE
	A = 1		r2 = B
	B = 1		r1 = A
	FENCE
	unlock(L)
	FENCE

All four outcomes possible (r1,r2) = (0,0), (0,1), (1,0), (1,1)

But if both use locks then two outcomes (r1,r2) = (0,0), (1,1)


			FENCE
			lock(L)
			FENCE
			r2 = B
			r1 = A
			FENCE
			unlock(L)
			FENCE
	FENCE
	lock(L)
	FENCE
	A = 1
	B = 1
	FENCE
	unlock(L)
	FENCE

Or
	FENCE
	lock(L)
	FENCE
	A = 1
	B = 1
	FENCE
	unlock(L)
	FENCE

			FENCE
			lock(L)
			FENCE
			r2 = B
			r1 = A
			FENCE
			unlock(L)
			FENCE


But can't "see" intra-critical-section reordering
* Philosophy:  If a tree fall in the woods, does it make a sound? Y or N but probably Y
* SC for DRF:  If references reorder w/i C.S. does any see? N for DRF

SC for DRF Implications
* Most programmers can reason with SC
* HW implementor can implement XC
* (Compiler/runtime can also reorder some)


Hardware Synchronization Primitives

cover only test-and-set and compare-and-swap

test and set
Boolean TAS(Boolean *a): atomic { t := *a; *a := true; return t }

swap
word Swap(word *a, word w): atomic { t := *a; *a := w; return t }

fetch and increment
int FAI(int *a): atomic { t := *a; *a := t + 1; return t }

fetch and add
int FAA(int *a, int n): atomic { t := *a; *a := t + n; return t }


compare and swap
Boolean CAS(word *a, word old, word new):
atomic { t := (*a == old); if (t) *a := new; return t }

load linked / store conditional
word LL(word *a): atomic { remember a; return *a }
Boolean SC(word *a, word w):
atomic { t := (a is remembered, and has not been evicted since LL)
if (t) *a := w; return t }
of

ABA Problem -- super subtle!!