CS 764, Spring 2013: Topics in Database Management Systems


Coordinates: MWF 9:30-10:45, Psych 107
Instructor: Christopher Ré
Office Hours:TBA or by appointment

Description

This course covers a number of advanced topics in development of database management systems (DBMSs) and the application of DBMSs in modern applications. Topics to be discussed include advanced concurrency control and recovery techniques, query processing and optimization strategies for relational database systems, advanced access methods, parallel and distributed database systems, extensible database systems, data analysis on large databases.

The course material will be drawn from a number of papers in the database literature. We will cover about 2 papers per week. All students in this class are expected to read the papers before coming to the lecture.

Prequesites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.

Text: There is no formal textbook for this course. The reading list is a collection of papers, which is posted below.
 
Reference text: The following two sources will be used occassionally in this course. Note you don't need to buy these books.

Grading and Deadlines (Tentative!)

Exam - I 30% Version 1. This is a take-home midterm due on April 22, 2013.
Research Report 35% You will prepare a survey article about a particular research topic of your chosing (in consultation with me). Your grade will be based on the depth in which you go in to this topic. A preliminary version of this paper will be due in the middle of the semester. This paper should be done in teams of two, and there will be a presentation to the class about this research topic.
Implementation Project 35% You will implement of one of the basic algorithms in the course, e.g., joins or indexes, to evaluate (1) to what extent you can replicate their results, and (2) how well this algorithm in some new data processing setting, e.g., joins over key-value stores. It will be up to you to propose your experimental design.

Reading List

Here is the reading list. Links to local copies on this page are password protected. Passwords will be given out on the first day of class.  This list is subject to change during the semester. When changes happen, I will let you know in advance.
Query Processing
I will assume you have covered chapters 12-15 of the Cow book, or its equivalent, in your UG course.
  1. Join Algorithms: Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264 (1986)
  2. Query Optimization-1: Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 
  3. Query Optimization-2: Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43
Buffer Management
  1. Buffer Management: Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1(3): 311-336 (1986).
  2. (No Review) Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum: The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD Conference 1993: 297-306
  3. (No Review) Goetz Graefe: The five-minute rule 20 years later (and how flash memory changes the rules). Commun. ACM 52(7): 48-59 (2009)
  4. (No Review) Jim Gray, Gianfranco R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD Conference 1987: 395-398
Advanced Transaction Management
I will assume you have covered chapters 16-18 of the Cow book, or its equivalent, in your UG course.
  1. Granularity of Locks: Jim Gray, Raymond A. Lorie, Gianfranco R. Putzolu, Irving L. Traiger: Granularity of Locks and Degrees of Consistency in a Shared Data Base. IFIP Working Conference on Modelling in Data Base Management Systems 1976.
  2. Optimistic CC: H. T. Kung, John T. Robinson: On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 6(2): 213-226 (1981).
  3. Isolation Levels: Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil: A Critique of ANSI SQL Isolation Levels. SIGMOD Conference 1995: 1-10.
  4. Aries Recovery: C. Mohan, Donald J. Haderle, Bruce G. Lindsay, Hamid Pirahesh, Peter M. Schwarz: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Trans. Database Syst. 17(1): 94-162 (1992)
  5. 2-Phase Commit: C. Mohan, Bruce G. Lindsay, Ron Obermarck: Transaction Management in the R* Distributed Database Management System. ACM Trans. Database Syst. 11(4): 378-396 (1986)
  6. (No Review) Eventually Consistent: Werner Vogels: Eventually consistent. Commun. ACM 52(1): 40-44 (2009).
  7. (No Review) Eric Brewer: CAP twelve years later: How the "rules" have changed. Computer 45 (2): 23-29.
  8. B-tree Locking: Philip L. Lehman, S. Bing Yao: Efficient Locking for Concurrent Operations on B-Trees. ACM Trans. Database Syst. 6(4): 650-670 (1981)
  9. (No Review) B-tree Locking survey. Goetz Graefe: A survey of B-tree locking techniques. ACM Trans. Database Syst. 35(3): (2010) (or TR)
  10. (No Review) Paxos Commit Jim Gray and Leslie Lamport. Consensus on Transaction Commit. MSR-TR-2003-96.
Parallel and Distributed DBMSs
  1. Distributed DBMSs: Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996).
  2. Replication: Jim Gray, Pat Helland, Patrick E. O'Neil, Dennis Shasha: The Dangers of Replication and a Solution. SIGMOD Conference 1996: 173-182.
  3. Parallel DBMSs: David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Comm. ACM 35(6): 85-98 (1992).
Advanced Access Methods
  1. R-tree: Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57.
  2. Bitmap Indices: Patrick E. O'Neil, Dallan Quass: Improved Query Performance with Variant Indexes. SIGMOD Conference 1997: 38-49.
Advanced Data Models
  1. OODBMS: Charles Lamb, Gordon Landis, Jack A. Orenstein, Danel Weinreb: The ObjectStore Database System. CACM 34(10): 50-63 (1991).
  2. Bucky: Michael J. Carey, David J. DeWitt, Jeffrey F. Naughton, Mohammad Asgarian, Paul Brown, Johannes Gehrke, Dhaval Shah: The BUCKY Object-Relational Benchmark (Experience Paper). SIGMOD Conference 1997: 135-146.
  3. Intro to Semi-structured data, XML and XQuery
  4. Data Model History, and XML as a data model
Data Analysis and Decision Support
  1. OLAP: Jim Gray et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery 1(1): 29-53 (1997).
  2. Mining: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499.
Misc (time permitting)
  1. C-Store: Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik: C-Store: A Column-oriented DBMS. VLDB 2005: 553-56.
  2. MapReduce: Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1): 107-113 (2008).
  3. BigTable: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26(2): (2008).
  4. Parallel DBMS v/s MapReduce: Michael Stonebraker, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1): 64-71 (2010).
  5. Parallel DBMS v/s MapReduce: Jeffrey Dean, Sanjay Ghemawat: MapReduce: a flexible data processing tool. Commun. ACM 53(1): 72-77 (2010).
  6. Google's Relational Database F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business.
The Art of Reading Papers (read on your own)
  1. Read Efficiently: Michael J. Hanson and updated by D. McNamee: Efficient Reading of Papers in Science and Technology, 1989.    
  2. Write Well: Patrick Valduriez, “Some Hints to Improve Writing of Technical Papers”, Correspondence in Engineering of Information Systems, Hermes, Vol. 2, No. 3, 1994.