-------------------------------------------------------------------- CS 758 Programming Multicore Processors Fall 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- ------------ Threads & DBMS ------------ OUTLINE * Relational DBMS 101 * Bayer and Schkolnick (* Degrees of consistency) NOT USED ------------------------------ Relational DBMS 101 ------------------- Relational DBMS big success to shared-memory parallel processing DeWitt and others Very sophisticated locking example Data logically stored in tables called relations Relation 1 name supervisor bob jane joe jane .. Say also want supervisor phone number, don't repeat in rel 1 but add rel 2 Relation 2 name phone bob 0001 joe 1223 jane 4567 If employee joe lease, delete from both relations 1. Must find joe fast -- often add index 2. Want both deletions to happen together In general: lookup (reader), insert (writer), delete (Writer) 1. Index * Could you binary tree but instead use Bx-tree -- all node with artity between k and 2k -- less height means fewer sequential accesses -- flexibiliy limits re-balancing -- insert or delete is "safe" keeps [k,2k] -- 2. ACID Transaction: State transformation that is: Atomic (all or nothing) Consistent Isolated (serializable) Durable (permanent) * Implement with Locks/Latches -- oxymoron slide * Also logs Bayer and Schkolnick --------------------- -1. Single "Big Lock" Exclusive Lock Use a single exclusive lock -- no concurrency, even for readers 0. Single "Big Lock" Reader/Writers Lock -- Multiple Readers don't conflict, so let them run concurrently -- Readers get read/S/rho lock -- Writers get write/X/xi lock ==> Explain compatibility graphs nodes (indicated by circles) are locks arcs indicate compatible states --> second circle in rho node is a link to itself 1. Fine-grain Pessimitic Writer Spider lock for read and simple write -- go over code -- write on board so can modify Get read/write lock on root Get lock on child Release lock on all "safe" ancestors -- Why not just lock "up"? DEADLOCK! ==> What is meant by safety? -- Reads are always safe -- Writes/updates/deletes are safe if they can't cause a split or a merge -- invariant is k <= #keys <= 2k -- safe if k < #keys < 2k 2. Optimistic Writer unchanged reader writer -- tries read locks at non-leaf levels -- Get write lock on leaf; if safe does update -- If not, releases all locks and does protocol 1 ==> Why is the release necessary? Why not simply get X lock on parent? 2a. Split on the way down. unchanged reader writer -- for each node, get read lock -- if node is NOT safe, drop read lock and get writer lock and make safe -- limits propagation of splits 2b. Merge is hard, so do merge off-line. unchaged reader writer -- insert splits on way down -- delete marks nodes that have become unsafe, logs them in a "to do" list ==> descend later to do the merge 3. Balanced Writers unchanged reader writer grabs alpha (intent) locks alpha locks are compatible with readers, so improves concurrency converts alpha locks to xi locks as necessary conversion waits for conflicting read locks to release request is put at the head of the lock queue to prevent starvation ?? 4. Generalized Solution -- tries to increase concurrency between writers (who rarely REALLY conflict on internal nodes -- too complex to worry about for this class --------------------------------------------------------------------------------------------- Could- talk about Gray Lorie Putzolu Traiger NOT USED * Intent locks (alpha locks above) * Degrees of consistency http://bnrg.cs.berkeley.edu/~adj/cs262/Lec_10_29.html NOT USED Advanced Topics in Computer Systems Fall 2001 Joe Hellerstein & Anthony Joseph Degrees of Consistency (a/k/a Isolation Levels) Despite all the discussion of ACID, sometimes it's nice to sacrifice semantic guarantees for the sake of performance. The goal is to let individual transactions choose this WITHOUT messing up the database or the other transactions that do care. Gray, et al.: Degrees of Consistency First, a definition: A write is committed when transaction if finished; otherwise, the write is dirty. A Locking-Based Description of Degrees of Consistency: This is not actually a description of the degrees, but rather of how to achieve them via locking. But it’s better defined. Degree 0: set short write locks on updated items ("short" = length of action) Degree 1: set long write locks on updated items ("long" = EOT) Degree 2: set long write locks on updated items, and short read locks on items read Degree 3: set long write and read locks A Dirty-Data Description of Degrees of Consistency Transaction T sees degree X consistency if... Degree 0: T does not overwrite dirty data of other transactions Degree 1: T sees degree 0 consistency, and T does not commit any writes before EOT Degree 2: T sees degree 1 consistency, and T does not read dirty data of other transactions Degree 3: T sees degree 2 consistency, and Other transactions do not dirty any data read by T before T completes. Examples of Inconsistencies prevented by Various Degrees Garbage reads: T1: write(X); T2: write(X) Who knows what value X will end up being? Solution: set short write locks (degree 0) Lost Updates: T1: write(X) T2: write(X) T1: abort (physical UNDO restores X to pre-T1 value) At this point, the update to T2 is lost Solution: set long write locks (degree 1) Dirty Reads: T1: write(X) T2: read(X) T1: abort Now T2’s read is bogus. Solution: set long X locks and short S locks (degree 2) Many systems do long-running queries at degree 2. Unrepeatable reads: T1: read(X) T2: write(X) T2: end transaction T1: read(X) Now T2 has read two different values for X. Solution: long read locks (degree 3) Phantoms: T1: read range [x - y] T2: insert z, x < z < y T2: end transaction T1: read range [x - y] Z is a "phantom" data item (eek!) Solution: ?? NOTE: if everybody is at least degree 1, than different transactions can CHOOSE what degree they wish to "see" without worry. I.e. can have a mixture of levels of consistency. Adya, et al. : Generalized Isolation Levels Gray et al's definitions (and the resulting ANSI standards) are not implementation-independent, and semantics are ill-defined. Want an implementation-independent semantic isolation levels which is as permissive as possible (most possible schedules allowed). Key insight: many dependencies are multi-object. Capture those, and you'll get the right semantics. Conflicts in Adya's Serialization Graphs: Read dependencies (WR): Def'n: Ti changes the matches of Tj for Tj's predicate-based reads if Ti installs a new version that either adds to or deletes from one of Tj's read predicates. Tj directly read depends on Ti if Ti directly installs some version that Tj subsequently reads (item-read-depends), or if Ti changes the matches of Tj. A way to think about predicate-based reads or phantoms: imagine that every object is versioned, there are "ghost versions" of objects before they're born and after they die. Predicate-based reads look at all latest versions of all objects (including ghosts), and what matters is the set of objects that do or do not match. See example on page 7 of the paper for H_{pred-read} Anti-dependencies (RW) Def'n: Tj overwrites a predicate-based read by Ti if Tj installs a new version of an object in the read by Ti that changes the matches of Ti. Tj directly anti-depends on Ti if Ti reads an object, and Tj installs the very next version of that object, or if Tj's install of any later version changes the matches of a read by Ti. Write dependencies (WW) Tj directly write-depends on Ti if Ti installs a version of an object, and Tj installs the next version. (Note there's no predicate-based version of write dependencies, since database writes are read-predicate/write-tuple). Direct Serialization Graph: nodes are committed xacts edges are directed by time, labeled WR, RW, or WW. Now we can talk about isolation in terms of serialization graphs and "histories" ("schedules"), NOT implementation. Adya's Isolation Levels Try to Generalize Gray's. PL-x = "Portable Level x". PL-1: try to serialize based on writes alone (ignore reads) -- ensure that updates are not interleaved. Specifically, no cycles containing only WW edges are allowed. Note: more permissive than Gray's Degree 1: allows concurrenct xacts to modify the same object...just ensure no cycles. But obvious locking implementation of PL-1: long write locks. PL-2: avoid aborted reads Specifically, no aborted reads, no intermediate reads, and no circular information flow (dependency-edge cycles in serialization graph) Note: cascaded aborts and/or commit delays prevent aborted reads Note: no intermediate reads means that committed xactions read only committed data More permissive than Degree 2, allows reads from uncommitted xacts Obvious locking implementation: long write locks, short read locks PL-3: prevent xactions from committing if they perform inconsistent reads or writes Specifically, do PL-2 AND no anti-dependency cycles More permissive than Degree 3, since a modifying xact can update an object previously read by another uncommited xact Obvious locking implementation: 2PL PL-2.99: generalize "REPEATABLE READ" REPEATABLE READ was long locks for everything except predicate reads (phantoms can happen) PL-2.99 is PL-2 + no cycles with item-anti-dependency edges Modeling Mixed-Mode Systems Mixed serialization graph, only contains dependencies relevant to a transaction's level (or obligatory dependencies required by other transactions' modes).