CS 764, Fall 2012: Topics in Database Management Systems

Coordinates: Engineering Hall Rm. 3345, 11:00AM-12:15PM, M,W,Fr
Instructor: J. Patel
Office Hours: Monday 10:00-11:00AM or by appointment


This course covers a number of advanced topics in development of database management systems (DBMs) and the application of DBMSs in modern applications. Topics to be discussed include advanced concurrency control and recovery techniques, query processing and optimization strategies for relational database systems, advanced access methods, parallel and distributed database systems, extensible database systems, data analysis on large databases.

The course material will be drawn from a number of papers in the database literature. We will cover about 2-3 papers per week. All students in this class are expected to read the papers before coming to the lecture.

Prequesites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.

Text: There is no formal textbook for this course. The reading list is a collection of papers, which is posted on the course web page.
Reference text: The following two sources will be used occassionally in this course. Note you don't need to buy these books.

Course Project

A big component of this course is a research project. For the project, you pick a topic in the area of database systems, and explore it in detail. I will provide a list of suggested project topics, though you are free to select a project outside of this list provided you get prior approval. I require that you meet with me periodically throughout the semester updating me on the progress of your project. I will help with the direction, but unless you take the initiative to actively explore the topic you choose, you are unlikely to accomplish much in the project (which will adversely affect your project grade).

The course project is a group project, and each group must be of size 2; individual projects (i.e. group size of 1) are also permitted. Please start looking for project partners right away. I will facilitate informal group formation by holding a socializing session at the end of the first few lectures, but it is your responsibility to form and manage groups.

The course project will include an interim course project report, a short project presentation at the end of the semester, and a final project report. I organize the final project presentation in a workshop-like format. The workshop is called DAWN.

Grading and Deadlines

Exam - I 30% 11:00AM-12:15PM, October 10, 2012.
Exam - II 30% 11:00AM-12:15PM, November 28, 2012.
Course Project 40% Project selection. Due Sept 28, 5:00PM.
5% for interm course report. Due October 24, 5:00PM.
30% for project report and project code. The code must include a packaged tar ball with instructions on how to run the code, and some demo examples. Final report due by Dec 21, 5:00PM.
5% for short project presentation as part of a workshop called DAWN'12, held on Dec 10, 12 and Dec 14 in CS 1207 from 11:00AM-12:15PM.

Reading List

Here is the reading list. Links to local copies on this page are password protected. Passwords will be given out on the first day of class.  This list is subject to change during the semester. When changes happen, I will let you know in advance.
Query Processing
I will assume you have covered chapters 12-15 of the Cow book, or its equivalent, in your UG course.
  1. Join Algorithms: Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264 (1986)
    Additional reading (not required): Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170 (1993)
    Additional reading (not required): Laura M. Haas, Michael J. Carey, Miron Livny, Amit Shukla: Seeking the Truth About ad hoc Join Costs. VLDB J. 6(3): 241-256 (1997)
  2. Query Optimization-1: Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 
  3. Query Optimization-2: Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43
    Additional reading (not required): Kiyoshi Ono, Guy M. Lohman: Measuring the Complexity of Join Enumeration in Query Optimization. VLDB 1990: 314-325
Advanced Transaction Management
I will assume you have covered chapters 16-18 of the Cow book, or its equivalent, in your UG course.
  1. Granularity of Locks: Jim Gray, Raymond A. Lorie, Gianfranco R. Putzolu, Irving L. Traiger: Granularity of Locks and Degrees of Consistency in a Shared Data Base. IFIP Working Conference on Modelling in Data Base Management Systems 1976.
  2. Optimistic CC: H. T. Kung, John T. Robinson: On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 6(2): 213-226 (1981).
  3. Isolation Levels: Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil: A Critique of ANSI SQL Isolation Levels. SIGMOD Conference 1995: 1-10.
  4. B-tree Locking: Philip L. Lehman, S. Bing Yao: Efficient Locking for Concurrent Operations on B-Trees. ACM Trans. Database Syst. 6(4): 650-670 (1981)
  5. Aries Recovery: C. Mohan, Donald J. Haderle, Bruce G. Lindsay, Hamid Pirahesh, Peter M. Schwarz: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Trans. Database Syst. 17(1): 94-162 (1992)
  6. 2-Phase Commit: C. Mohan, Bruce G. Lindsay, Ron Obermarck: Transaction Management in the R* Distributed Database Management System. ACM Trans. Database Syst. 11(4): 378-396 (1986)
Buffer Management
  1. Buffer Management: Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1(3): 311-336 (1986).
    Additional reading (not required): Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum: The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD Conference 1993: 297-306
Parallel and Distributed DBMSs
  1. Distributed DBMSs: Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996).
  2. Replication: Jim Gray, Pat Helland, Patrick E. O'Neil, Dennis Shasha: The Dangers of Replication and a Solution. SIGMOD Conference 1996: 173-182.
  3. Parallel DBMSs: David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Comm. ACM 35(6): 85-98 (1992).
Advanced Access Methods
  1. R-tree: Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57.
  2. Bitmap Indices: Patrick E. O'Neil, Dallan Quass: Improved Query Performance with Variant Indexes. SIGMOD Conference 1997: 38-49.
    Additional reading (not required): C-Store: Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik: C-Store: A Column-oriented DBMS. VLDB 2005: 553-56.
Data Models
  1. Data Models: Stonebraker and Hellerstein: What Goes Around Comes Around
Data Analysis and Decision Support
  1. OLAP: Jim Gray et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery 1(1): 29-53 (1997).
  2. Mining: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499.
Misc (read on your own)
  1. MapReduce: Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1): 107-113 (2008).
  2. Parallel UDFs: Friedman, Peter M. Pawlowski, John Cieslewicz: SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB 2(2): 1402-1413 (2009)
  3. BigTable: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26(2): (2008).
  4. Eventually Consistent: Werner Vogels: Eventually consistent. Commun. ACM 52(1): 40-44 (2009).
The Art of Reading Papers (read on your own)
  1. Read Efficiently: Michael J. Hanson and updated by D. McNamee: Efficient Reading of Papers in Science and Technology, 1989.    
  2. Write Well: Patrick Valduriez, “Some Hints to Improve Writing of Technical Papers”, Correspondence in Engineering of Information Systems, Hermes, Vol. 2, No. 3, 1994.