Lectures: Mon/Wed 1:00pm - 2:15pm online
Instructor: Xiangyao Yu
Office Hours: Mon 2:30pm - 3:30pm online
This course covers a number of advanced topics in the development of database management systems (DBMS) and the modern applications of databases. The topics discussed include advanced concurrency control and recovery, query processing and optimization, advanced access methods, parallel and distributed data systems, extensible data systems, implications of cloud computing for data platforms, and data analysis on large datasets. The course material will be drawn from a number of papers in the database literature. We will cover one paper per lecture. All students are expected to read the paper before coming to the lecture.
Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.
Reference Textbook: There is no formal textbook for this course. The reading list is a collection of papers. The following two books will be used as references in this course. Note you don't need to buy the books.
Lecture Format: All lectures are given online at canvas.wisc.edu -> Courses -> FA20 COMPSCI 764 -> BBCollaborate Ultra. A session can be joined 15 min before the lecture and ends 15 min after the lecture. Office hours will be held online using the same software; separate sessions will be created for office hours. Each lecture focuses on a classic research paper. Students will read the paper and submit a review to https://wisc-cs764-f20.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing.
Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in detail. Here is a list of suggested project topics, but you are encouraged to select a project outside of the list. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. The presentation is organized as a workshop called DAWN. Please see the program information for DAWN 2019 to have an idea of what it looks like.
|Query Processing and Buffer Management|
Leonard D. Shapiro, Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 1986.
[optional] Goetz Graefe, Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 1993
[optional] Laura M. Haas, et al., Seeking the Truth About ad hoc Join Costs. JVLDB 1997.
[optional] Jaeyoung Do, Jignesh M. Patel, Join processing for flash SSDs: remembering past lessons. DaMoN 2009.
|3||Mon 9/14||Buffer Management||
Hong-Tai Chou, David J. DeWitt, An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1986.
[optional] Elizabeth J. O'Neil, et al., The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD 1993.
[optional] Jim Gray, Gianfranco R. Putzolu, The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD 1987.
|4||Wed 9/16||Query Optimization-1||
Patricia G. Selinger, et al., Access Path Selection in a Relational Database Management System. SIGMOD 1979.
[optional] E. F. Codd, A Relational Model of Data for Large Shared Data Banks. Commun. ACM 1970.
|5||Mon 9/21||Query Optimization-2||
Surajit Chaudhuri, An Overview of Query Optimization in Relational Systems. PODS 1998.
[optional] Kiyoshi Ono, Guy M. Lohman, Measuring the Complexity of Join Enumeration in Query Optimization. VLDB 1990.
|Advanced Transaction Management|
|6||Wed 9/23||Granularity of Locks||
Jim Gray, et al., Granularity of Locks and Degrees of Consistency in a Shared Data Base. Modelling in Data Base Management Systems 1976.
|7||Mon 9/28||Optimistic CC||
H. T. Kung, John T. Robinson, On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 1981.
[optional] Per-Ake Larson, et al., High-Performance Concurrency Control Mechanisms for Main-Memory Databases. PVLDB 2011.
|8||Wed 9/30||Guest Lecture|
|9||Mon 10/5||B-tree Locking||
Philip L. Lehman, S. Bing Yao: Efficient Locking for Concurrent Operations on B-Trees. ACM Trans. Database Syst. 1981.
[optional] Hal Berenson, et al., A Critique of ANSI SQL Isolation Levels. SIGMOD 1995.
|10||Wed 10/7||Aries Recovery||
C. Mohan, et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Trans. Database Syst. 1992.
|11||Mon 10/12||2-Phase Commit||
C. Mohan, et al., Transaction Management in the R* Distributed Database Management System. ACM Trans. Database Syst. 1986.
|Cloud Systems, Parallel DBMSs, and and Distributed DBMSs|
|12||Wed 10/14||Parallel DBMSs||
David J. DeWitt, Jim Gray, Parallel Database Systems: The Future of High Performance Database Systems. Comm. ACM 1992.
|13||Mon 10/19||Distributed DBMSs||
Michael Stonebraker, et al., Mariposa: A Wide-Area Distributed Database System. VLDB 1996.
[optional] Jim Gray, et al., The Dangers of Replication and a Solution. SIGMOD 1996.
[optional] Werner Vogels, Eventually consistent. Commun. ACM 2009.
Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Commun. ACM 2008.
[optional] Fay Chang, et al., Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. 2008.
[optional] Friedman, Peter M. Pawlowski, John Cieslewicz, SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB 2009.
|15||Mon 10/26||Guest Lecture||
|16||Wed 10/28||Office hour||
|17||Mon 11/2||Office hour||
Take home exam from Nov. 3, 5pm to Nov. 5, 5pm.
Sample exam question
|20||Wed 11/11||Guest Lecture||
|27||Mon 12/7||DAWN Workshop||
|28||Wed 12/9||DAWN Workshop||