CS 764, Fall 2010: Topics in Database Management Systems


Coordinates: CS 1257, 01:00 PM - 02:15 PM, Tu, Th
Instructor: J. Patel
Office Hours: Tuesday noon-1:00PM or by appointment

Description

This course covers a number of advanced topics in development of database management systems (DBMs) and the application of DBMSs in modern applications. Topics to be discussed include advanced concurrency control and recovery techniques, query processing and optimization strategies for relational database systems, advanced access methods, parallel and distributed database systems, extensible database systems, data analysis on large databases.

The course material will be drawn from a number of papers in the database literature. We will cover about 2 papers per week. All students in this class are expected to read the papers before coming to the lecture.

Prequesites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.

Text: There is no formal textbook for this course. The reading list is a collection of papers, which is posted on the course web page.
 
Reference text: The following two sources will be used occassionally in this course. Note you don't need to buy these books.

Course Project

A big component of this course is a research project. For the project, you pick a topic in the area of database systems, and explore it in detail. I will provide a list of suggested project topics, though you are free to select a project outside of this list provided you get prior approval. I require that you meet with me periodically throughout the semester updating me on the progress of your project. I will help with the direction, but unless you take the initiative to actively explore the topic you choose, you are unlikely to accomplish much in the project (which will adversely affect your project grade).

The course project is a group project, and each group must be of size 2 or 3. Please start looking for project partners right away. I will facilitate informal group formation by holding a socializing session at the end of the first few lectures, but it is your responsibility to form and manage groups.

The course project will include an interim course project report, a short project presentation at the end of the semester, and a final project report. I organize the final project presentation in a workshop-like format. The workshop is called DAWN.

Grading and Deadlines (Tentative!)

Exam - I 30% 1:00-2:15PM, October 14, 2010.
Exam - II 30% 1:00-2:15PM, November 30, 2010.
Course Project 40% Project selection. Due Sept 28, 1:00PM.
5% for interm course report. Due October 26, 1:00PM.
30% for project report and project code. The code must include a packaged tar ball with instructions on how to run the code, and some demo examples. Final report due by Dec 22, 1:00PM.
5% for short project presentation as part of a workshop called DAWN'10, held on Dec 7 in CS 2310 from noon-2:30P; lunch will be served from 11:30-noon in CS 3331.  

Reading List

Here is the reading list. Links to local copies on this page are password protected. Passwords will be given out on the first day of class.  This list is subject to change during the semester. When changes happen, I will let you know in advance.
Query Processing
I will assume you have covered chapters 12-15 of the Cow book, or its equivalent, in your UG course.
  1. Join Algorithms: Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264 (1986)
  2. Query Optimization-1: Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 
  3. Query Optimization-2: Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43
Buffer Management
  1. Buffer Management: Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1(3): 311-336 (1986).
Advanced Transaction Management
I will assume you have covered chapters 16-18 of the Cow book, or its equivalent, in your UG course.
  1. Granularity of Locks: Jim Gray, Raymond A. Lorie, Gianfranco R. Putzolu, Irving L. Traiger: Granularity of Locks and Degrees of Consistency in a Shared Data Base. IFIP Working Conference on Modelling in Data Base Management Systems 1976.
  2. Optimistic CC: H. T. Kung, John T. Robinson: On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 6(2): 213-226 (1981).
  3. Isolation Levels: Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil: A Critique of ANSI SQL Isolation Levels. SIGMOD Conference 1995: 1-10.
  4. B-tree Locking: Philip L. Lehman, S. Bing Yao: Efficient Locking for Concurrent Operations on B-Trees. ACM Trans. Database Syst. 6(4): 650-670 (1981)
  5. Aries Recovery: C. Mohan, Donald J. Haderle, Bruce G. Lindsay, Hamid Pirahesh, Peter M. Schwarz: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Trans. Database Syst. 17(1): 94-162 (1992)
  6. 2-Phase Commit: C. Mohan, Bruce G. Lindsay, Ron Obermarck: Transaction Management in the R* Distributed Database Management System. ACM Trans. Database Syst. 11(4): 378-396 (1986)
Parallel and Distributed DBMSs
  1. Distributed DBMSs: Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996).
  2. Replication: Jim Gray, Pat Helland, Patrick E. O'Neil, Dennis Shasha: The Dangers of Replication and a Solution. SIGMOD Conference 1996: 173-182.
  3. Parallel DBMSs: David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Comm. ACM 35(6): 85-98 (1992).
Advanced Access Methods
  1. R-tree: Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57.
  2. Bitmap Indices: Patrick E. O'Neil, Dallan Quass: Improved Query Performance with Variant Indexes. SIGMOD Conference 1997: 38-49.
Advanced Data Models
  1. OODBMS: Charles Lamb, Gordon Landis, Jack A. Orenstein, Danel Weinreb: The ObjectStore Database System. CACM 34(10): 50-63 (1991).
  2. Bucky: Michael J. Carey, David J. DeWitt, Jeffrey F. Naughton, Mohammad Asgarian, Paul Brown, Johannes Gehrke, Dhaval Shah: The BUCKY Object-Relational Benchmark (Experience Paper). SIGMOD Conference 1997: 135-146.
  3. Intro to Semi-structured data, XML and XQuery
  4. Data Model History, and XML as a data model
Data Analysis and Decision Support
  1. OLAP: Jim Gray et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery 1(1): 29-53 (1997).
  2. Mining: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499.
Misc (time permitting)
  1. C-Store: Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik: C-Store: A Column-oriented DBMS. VLDB 2005: 553-56.
  2. MapReduce: Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1): 107-113 (2008).
  3. BigTable: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26(2): (2008).
  4. Parallel DBMS v/s MapReduce: Michael Stonebraker, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1): 64-71 (2010).
  5. Parallel DBMS v/s MapReduce: Jeffrey Dean, Sanjay Ghemawat: MapReduce: a flexible data processing tool. Commun. ACM 53(1): 72-77 (2010).
  6. Eventually Consistent: Werner Vogels: Eventually consistent. Commun. ACM 52(1): 40-44 (2009).
The Art of Reading Papers (read on your own)
  1. Read Efficiently: Michael J. Hanson and updated by D. McNamee: Efficient Reading of Papers in Science and Technology, 1989.    
  2. Write Well: Patrick Valduriez, “Some Hints to Improve Writing of Technical Papers”, Correspondence in Engineering of Information Systems, Hermes, Vol. 2, No. 3, 1994.