Lectures: Mon/Wed 1:00pm - 2:15pm online
Instructor: Xiangyao Yu
Office Hours: Mon 2:30pm - 3:30pm online
This course covers a number of advanced topics in the development of database management systems (DBMS) and the modern applications of databases. The topics discussed include advanced concurrency control and recovery, query processing and optimization, advanced access methods, parallel and distributed data systems, extensible data systems, implications of cloud computing for data platforms, and data analysis on large datasets. The course material will be drawn from a number of papers in the database literature. We will cover one paper per lecture. All students are expected to read the paper before coming to the lecture.
Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.
Reference Textbook: There is no formal textbook for this course. The reading list is a collection of papers. The following two books will be used as references in this course. Note you don't need to buy the books.
Lecture Format: All lectures are given online at canvas.wisc.edu -> Courses -> FA20 COMPSCI 764 -> BBCollaborate Ultra. A session can be joined 15 min before the lecture and ends 15 min after the lecture. Office hours will be held online using the same software; separate sessions will be created for office hours. Each lecture focuses on a classic research paper. Students will read the paper and submit a review to https://wisc-cs764-f20.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing.
Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in detail. Here is a list of suggested project topics, but you are encouraged to select a project outside of the list. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. The presentation is organized as a workshop called DAWN. Please see the program information for DAWN 2019 to have an idea of what it looks like.
|Query Processing and Buffer Management|
Leonard D. Shapiro, Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 1986.
[optional] Goetz Graefe, Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 1993
[optional] Laura M. Haas, et al., Seeking the Truth About ad hoc Join Costs. JVLDB 1997.
[optional] Jaeyoung Do, Jignesh M. Patel, Join processing for flash SSDs: remembering past lessons. DaMoN 2009.
|3||Mon 9/14||Buffer Management||
Hong-Tai Chou, David J. DeWitt, An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1986.
[optional] Elizabeth J. O'Neil, et al., The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD 1993.
[optional] Jim Gray, Gianfranco R. Putzolu, The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD 1987.
|4||Wed 9/16||Query Optimization-1||
Patricia G. Selinger, et al., Access Path Selection in a Relational Database Management System. SIGMOD 1979.
[optional] E. F. Codd, A Relational Model of Data for Large Shared Data Banks. Commun. ACM 1970.
|5||Mon 9/21||Query Optimization-2||
Surajit Chaudhuri, An Overview of Query Optimization in Relational Systems. PODS 1998.
[optional] Kiyoshi Ono, Guy M. Lohman, Measuring the Complexity of Join Enumeration in Query Optimization. VLDB 1990.
|Advanced Transaction Management|
|6||Wed 9/23||Granularity of Locks||
Jim Gray, et al., Granularity of Locks and Degrees of Consistency in a Shared Data Base. Modelling in Data Base Management Systems 1976.
|7||Mon 9/28||Optimistic CC||
H. T. Kung, John T. Robinson, On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 1981.
[optional] Per-Ake Larson, et al., High-Performance Concurrency Control Mechanisms for Main-Memory Databases. PVLDB 2011.
|8||Wed 9/30||Guest Lecture||Shasank Chavan from Oracle, Hardware Acceleration with Oracle Database In-Memory|
|9||Mon 10/5||B-tree Locking||
Philip L. Lehman, S. Bing Yao: Efficient Locking for Concurrent Operations on B-Trees. ACM Trans. Database Syst. 1981.
[optional] Hal Berenson, et al., A Critique of ANSI SQL Isolation Levels. SIGMOD 1995.
|10||Wed 10/7||Aries Recovery||
C. Mohan, et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Trans. Database Syst. 1992.
|11||Mon 10/12||2-Phase Commit||
C. Mohan, et al., Transaction Management in the R* Distributed Database Management System. ACM Trans. Database Syst. 1986.
|Cloud Systems, Parallel DBMSs, and and Distributed DBMSs|
|12||Wed 10/14||Parallel DBMSs||
David J. DeWitt, Jim Gray, Parallel Database Systems: The Future of High Performance Database Systems. Comm. ACM 1992.
|13||Mon 10/19||Distributed DBMSs||
Michael Stonebraker, et al., Mariposa: A Wide-Area Distributed Database System. VLDB 1996.
[optional] Jim Gray, et al., The Dangers of Replication and a Solution. SIGMOD 1996.
[optional] Werner Vogels, Eventually consistent. Commun. ACM 2009.
Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Commun. ACM 2008.
[optional] Fay Chang, et al., Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. 2008.
[optional] Friedman, Peter M. Pawlowski, John Cieslewicz, SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB 2009.
|15||Mon 10/26||Guest Lecture||
Ippokratis Pandis from AWS, Amazon Redshift and its practical use of machine Learning
|16||Wed 10/28||Office hour||
|17||Mon 11/2||Exam review||
|18||Wed 11/4||Exam||Take home exam from Nov. 3, 5pm to Nov. 6, 5pm. sample1, sample2, sample3|
Dageville, Benoit, et al.The snowflake elastic data warehouse, SIGMOD 2016
|20||Wed 11/11||Guest Lecture||
Jiaqi Yan from Snowflake, Automatic Clustering at Snowflake
Yu, Xiangyao, et al. PushdownDB: Accelerating a DBMS using S3 Computation, ICDE 2020
[optional] Do, Jaeyoung, et al., Query Processing on Smart SSDs: Opportunities and Challenges, SIGMOD 2013
|22||Wed 11/18||Amazon Aurora||
Verbitski, Alexandre, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases, SIGMOD 2017
[optional] Verbitski, Alexandre, et al., Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes, SIGMOD 2018
|23||Mon 11/23||Lambda functions||
Ingo Müller, Renato Marroquín, and Gustavo Alonso, Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure, SIGMOD 2020
[optional] Perron, Matthew, et al. Starling: A Scalable Query Engine on Cloud Functions, SIGMOD 2020
|24||Wed 11/25||Modern OCC||
Yihe Huang, et al. Opportunities for Optimism in Contended Main-Memory Multicore Transactions, VLDB 2020
[optional] Stephen Tu, et al. Speedy Transactions in Multicore In-Memory Databases, SOSP 2013
|25||Mon 11/30||GPU databases||
Clemens Lutz, et al. Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects, SIGMOD 2020
[optional] Anil Shanbhag, Samuel Madden, Xiangyao Yu, A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics, SIGMOD 2020
Jiacheng Yang, et al. F1 Lightning: HTAP as a Service, VLDB 2020
[optional] Dongxu Huang, et al. TiDB: A Raft-based HTAP Database, VLDB 2020
|27||Mon 12/7||DAWN Workshop||
|28||Wed 12/9||DAWN Workshop||