--------------------------------------------------------------------
CS 758 Programming Multicore Processors 
Fall 2012 Section 1
Instructor Mark D. Hill
--------------------------------------------------------------------

------------
Threads & DBMS
------------

OUTLINE
* Relational DBMS 101
* Bayer and Schkolnick
(* Degrees of consistency) NOT USED

------------------------------


Relational DBMS 101
-------------------
Relational DBMS big success to shared-memory parallel processing
DeWitt and others

Very sophisticated locking example

Data logically stored in tables called relations

Relation 1
name supervisor
bob  jane
joe  jane
..

Say also want supervisor phone number, don't repeat in rel 1 but add rel 2

Relation 2
name phone
bob  0001
joe  1223
jane 4567


If employee joe lease, delete from both relations

1. Must find joe fast -- often add index
2. Want both deletions to happen together

In general: lookup (reader), insert (writer), delete (Writer)

1. Index
* Could you binary tree but instead use Bx-tree 
-- all node with artity between k and 2k
-- less height means fewer sequential accesses
-- flexibiliy limits re-balancing
-- insert or delete is "safe" keeps [k,2k]
 --

2. ACID 

Transaction: State transformation that is:
Atomic (all or nothing)
Consistent
Isolated (serializable)
Durable (permanent)

* Implement with Locks/Latches -- oxymoron slide
* Also logs

Bayer and Schkolnick
---------------------

-1. Single "Big Lock" Exclusive Lock
Use a single exclusive lock
-- no concurrency, even for readers

0. Single "Big Lock" Reader/Writers Lock
-- Multiple Readers don't conflict, so let them run concurrently
-- Readers get read/S/rho lock
-- Writers get write/X/xi lock

==> Explain compatibility graphs
	nodes (indicated by circles) are locks
	arcs indicate compatible states
	    --> second circle in rho node is a link to itself

1. Fine-grain Pessimitic Writer

Spider lock for read and simple write -- go over code -- write on board so
can modify
	Get read/write lock on root
	Get lock on child
	Release lock on all "safe" ancestors
		-- Why not just lock "up"? DEADLOCK!

==> What is meant by safety?
	-- Reads are always safe
	-- Writes/updates/deletes are safe if they can't cause a split or a merge
		-- invariant is k <= #keys <= 2k
		-- safe if k < #keys < 2k

2.  Optimistic Writer
unchanged reader
writer
-- tries read locks at non-leaf levels
-- Get write lock on leaf; if safe does update
-- If not, releases all locks and does protocol 1
	==> Why is the release necessary?
		Why not simply get X lock on parent?

2a. Split on the way down.
unchanged reader
writer
-- for each node, get read lock
-- if node is NOT safe, drop read lock and get writer lock and make safe
-- limits propagation of splits

2b. Merge is hard, so do merge off-line.
unchaged reader
writer
-- insert splits on way down
-- delete marks nodes that have become unsafe, logs them in a "to do" list
     ==> descend later to do the merge


3. Balanced Writers
unchanged reader
writer grabs alpha (intent) locks
	alpha locks are compatible with readers, so improves concurrency
converts alpha locks to xi locks as necessary
	conversion waits for conflicting read locks to release
        request is put at the head of the lock queue to prevent starvation
??

4. Generalized Solution 
-- tries to increase concurrency between writers (who rarely REALLY conflict on internal nodes
-- too complex to worry about for this class


---------------------------------------------------------------------------------------------

Could- talk about Gray Lorie Putzolu Traiger NOT USED
* Intent locks (alpha locks above)
* Degrees of consistency

http://bnrg.cs.berkeley.edu/~adj/cs262/Lec_10_29.html NOT USED

  
Advanced Topics in Computer Systems 	
Fall 2001
Joe Hellerstein & Anthony Joseph 	
Degrees of Consistency (a/k/a Isolation Levels)
Despite all the discussion of ACID, sometimes it's nice to sacrifice semantic
guarantees for the sake of performance.  The goal is to let individual
transactions choose this WITHOUT messing up the database or the other
transactions that do care.
Gray, et al.: Degrees of Consistency
First, a definition: A write is committed when transaction if finished;
otherwise, the write is dirty.

A Locking-Based Description of Degrees of Consistency:

This is not actually a description of the degrees, but rather of how to
achieve them via locking. But its better defined.

    Degree 0: set short write locks on updated items ("short" = length of
action)
    Degree 1: set long write locks on updated items ("long" = EOT)
    Degree 2: set long write locks on updated items, and short read locks on
items read
    Degree 3: set long write and read locks

A Dirty-Data Description of Degrees of Consistency

Transaction T sees degree X consistency if...

    Degree 0: T does not overwrite dirty data of other transactions
    Degree 1:

        T sees degree 0 consistency, and
        T does not commit any writes before EOT

    Degree 2:

        T sees degree 1 consistency, and
        T does not read dirty data of other transactions

    Degree 3:

        T sees degree 2 consistency, and
        Other transactions do not dirty any data read by T before T completes.

Examples of Inconsistencies prevented by Various Degrees

    Garbage reads:


    T1: write(X); T2: write(X)

    Who knows what value X will end up being?

    Solution: set short write locks (degree 0)
     
    Lost Updates:


    T1: write(X)

    T2: write(X)

    T1: abort (physical UNDO restores X to pre-T1 value)

    At this point, the update to T2 is lost

    Solution: set long write locks (degree 1)
     
    Dirty Reads:


    T1: write(X)

    T2: read(X)

    T1: abort

    Now T2s read is bogus.

    Solution: set long X locks and short S locks (degree 2)

    Many systems do long-running queries at degree 2.
     
    Unrepeatable reads:


    T1: read(X)

    T2: write(X)

    T2: end transaction

    T1: read(X)

    Now T2 has read two different values for X.

    Solution: long read locks (degree 3)
     
    Phantoms:


    T1: read range [x - y]

    T2: insert z, x < z < y

    T2: end transaction

    T1: read range [x - y]

    Z is a "phantom" data item (eek!)

    Solution: ??
     

NOTE: if everybody is at least degree 1, than different transactions can
CHOOSE what degree they wish to "see" without worry.  I.e. can have a mixture
of levels of consistency.
Adya, et al. : Generalized Isolation Levels
Gray et al's definitions (and the resulting ANSI standards) are not
implementation-independent, and semantics are ill-defined.

Want an implementation-independent semantic isolation levels which is as
permissive as possible (most possible schedules allowed).

Key insight: many dependencies are multi-object.  Capture those, and you'll
get the right semantics.

Conflicts in Adya's Serialization Graphs:

    Read dependencies (WR):
        Def'n: Ti changes the matches of Tj for Tj's predicate-based reads if
Ti installs a new version that either adds to or deletes from one of Tj's read
predicates.
        Tj directly read depends on Ti if Ti directly installs some version
that Tj subsequently reads (item-read-depends), or if Ti changes the matches
of Tj.
        A way to think about predicate-based reads or phantoms: imagine that
every object is versioned, there are "ghost versions" of objects before
they're born and after they die.  Predicate-based reads look at all latest
versions of all objects (including ghosts), and what matters is the set of
objects that do or do not match.  See example on page 7 of the paper for
H_{pred-read}
    Anti-dependencies (RW)
        Def'n: Tj overwrites a predicate-based read by Ti if Tj installs a new
version of an object in the read by Ti that changes the matches of Ti.
        Tj directly anti-depends on Ti if Ti reads an object, and Tj installs
the very next version of that object, or if Tj's install of any later version
changes the matches of a read by Ti.
    Write dependencies (WW)
        Tj directly write-depends on Ti if Ti installs a version of an object,
and Tj installs the next version.  (Note there's no predicate-based version of
write dependencies, since database writes are read-predicate/write-tuple).

Direct Serialization Graph:

    nodes are committed xacts
    edges are directed by time, labeled WR, RW, or WW.

Now we can talk about isolation in terms of serialization graphs and
"histories" ("schedules"), NOT implementation.

Adya's Isolation Levels
Try to Generalize Gray's.  PL-x = "Portable Level x".

    PL-1: try to serialize based on writes alone (ignore reads) -- ensure that
updates are not interleaved.
        Specifically, no cycles containing only WW edges are allowed.
        Note: more permissive than Gray's Degree 1: allows concurrenct xacts
to modify the same object...just ensure no cycles.
        But obvious locking implementation of PL-1: long write locks.
    PL-2: avoid aborted reads
        Specifically, no aborted reads, no intermediate reads, and no circular
information flow (dependency-edge cycles in serialization graph)
        Note: cascaded aborts and/or commit delays prevent aborted reads
        Note: no intermediate reads means that committed xactions read only
committed data
        More permissive than Degree 2, allows reads from uncommitted xacts
        Obvious locking implementation: long write locks, short read locks
    PL-3: prevent xactions from committing if they perform inconsistent reads
or writes
        Specifically, do PL-2 AND no anti-dependency cycles
        More permissive than Degree 3, since a modifying xact can update an
object previously read by another uncommited xact
        Obvious locking implementation: 2PL
    PL-2.99: generalize "REPEATABLE READ"
        REPEATABLE READ was long locks for everything except predicate reads
(phantoms can happen)
        PL-2.99 is PL-2 + no cycles with item-anti-dependency edges

Modeling Mixed-Mode Systems

    Mixed serialization graph, only contains dependencies relevant to a
transaction's level (or obligatory dependencies required by other
transactions' modes).