--------------------------------------------------------------------
CS 757 Parallel Computer Architecture
Spring 2012 Section 1
Instructor Mark D. Hill
--------------------------------------------------------------------

------------
Coherence 1
------------

Outline
 Review System Model & Coherence Invariants
 Specifying Coherence Protocols
 MOESI States
 Simple Snooping
 Non-Atomic Requests
 Exclusive (E) and Owned (O) States (now or next time)


Review System Model & Coherence Invariants
------------------------------

System model -- Figure 2.1

cores w/ private caches w/ controllers
icn
LLC w/ memory w/ controller
Offchip DRAM


Coherence Invariants
------------------------------

1. Single-Writer, Multiple-Read (SWMR) Invariant. For any memory location A, at
any
given (logical) time, there exists only a single core that may write to A (and
can also read it)
or some number of cores that may only read A.
2. Data-Value Invariant. The value of the memory location at the start of an
epoch is the same
as the value of the memory location at the end of its last readwrite
epoch.


Show Figure 2.3 timeline
  read-write at 1
  read-only at 2, 3
  read-write at 2
..

Maintiain invariants
* Use (64B blocks)
* FSM at caches & LLC
* communication with message/bus

Goal:
* Make cache is invisible as in uniprocessors
* Once invisible, what does memory do?  (can't refer to caches)


Specifying Coherence Protocols
------------------------------

FSMs communicating via messages.  Use Table:

Row for state and transient states
Columns for events -- core request and incoming messages

VI:

I-->V Own-Get_DataResp
V-->I Own--Put or Other-Get

Go over Table 6.2
Transient state IV[D]

Note: 

P1's cache has a virtual FSM per block
Pi's cache has same FSM (but may in in different state)
Memory (LLC) also have virtual FSM per block (difference from cache FSM)  --
See Table 6.3

MOESI
-----
		Validity	Dirtiness	Exclusivity	Owned
Modified	   X		    X		    X	          X
(Owned) 	   X		     		    	          X
(Exclusive)	   X		     		    X
Shared  	   X		     		     
Invalid 	 


Stable states stored in cache, e.g., ceiling(log2(5)) = 3 bits
Transient states in MSHRs

Common Transactions: GetS GetM Upgrade (PutS) (PutE) PutO PutM

Common Requests: load, store, RMW, i-fetch, RO-prefetch, RW-prefetch, replace

Protocol Taxonomy (simplified)

* Snooping: totally-ordered broadcast (Chapter 7)

* Directory: point-to-point message with level of indirection (Chapter 8)


Write Invalidate vs. Write Update
* assumed write invalidate & this is more common
* write update -- hard to implement memory consistency models and too much traffic


=====================
Snooping (Chapter 7)
=====================

Simple -- 
--------------------
* Atomic Requests (request ordered same cycle it is issued)
* Atomic Transactions (no other request to SAME block until transaction done)
Show
* $ FSM Figure 7.1
* mem FSM Figure 7.1
* system

Go over FSMs in Figures 7.5 and 7.6
shaded -- not possible
blank -- no action
Mem: IorS[D] -- memory waiting for writeback as part of cores doing M to S transistion
store in M is GetM/SM[D] -- sends data reduntanly -- could have Upgrade


Figure 7.8-7.9  -- Non-atomic request (e.g., queue to get on bus), Atomic Transactions
Store in I, send GetM ==> IM[AD], see own GetM (ordering point), could "do" store
==> IM[D], gets data ==> M, finishes store


Consider "window of vulnerability"
Store in S, send GetM/Upgrade ==> SM[AD], see OTHER GetM so invalidate ==> IM[AD], own GetM
==> IM[D], gets data ==> M, finishes store
Makes upgrade transacton trickier


Normal Writeback
writeback in M, send PutM ==> MI[A], see own PutM, send data ==> I

Writeback racing other GetM
writeback in M, send PutM ==> MI[A], see other GetM, send data ==> II[A], see own PutM ==> I


Exclusive (E) State
--------------------
* Idea: on GetS, if no other sharers, goto E instead of S
* If subsequent store, silently go to M
* If other Gets, silently go to S
* If replace in E, treat like S (silent replacement)

Important to (mostly) private data
* Otherwise read miss (GetS) then write miss (GetM)
* With E read mis (GetS) and then silent upgrade to M

How implement "if no other sharers"
* Add state to LLC if there is (was) at least one sharer
* Before LLCs, often had logical "wired" OR of sharers -- shared line
[Before LLCS, memory often found out whether there was an M block with OR as well -- owned line]

Look at FSMs 7.4 and 7.5


Owned (O) State
--------------------
Advantages
Otherwise in M 7 see other GetS, must send data to BOTH requestor and LLC
O state eliminates extra data message and LLC updates
Historialy, O also allowed subsequent GetS to be source from cache -- which
could be faster than memory (old days) but probably not LLC
See FSM in Figure 7.6 and 7.7