CS 764 Topics in Database Management Systems

Lectures: Mon/Wed 1:00pm - 2:15pm
Room: Psychology 103
Instructor: Xiangyao Yu
Office Hours: Mon 2:30pm - 3:30pm

Course description

This course covers a number of advanced topics in the development of database management systems (DBMS) and the modern applications of databases. The topics discussed include query processing and optimization, advanced access methods, advanced concurrency control and recovery, parallel and distributed data systems, implications of cloud computing for data platforms, and data processing with emerging hardware. The course material will be drawn from a number of papers in the database literature. We will cover one paper per lecture. All students are expected to read the paper before coming to the lecture.

Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.

Reference Textbook: There is no formal textbook for this course. The reading list is a collection of papers. The following two books will be used as references in this course. Note you don't need to buy the books.

Lecture Format: Each lecture focuses on a classic or modern research paper. Students will read the paper and submit a review to https://wisc-cs764-f21.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing.

Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of suggested project topics created in 2020 and 2021, but you are encouraged to select a project outside of the list. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous years (sample1, sample2, sample3). The presentations are organized as a workshop. Please see the program information for DAWN 2019 to have an idea of what it looks like. The project has the following deadlines:

Computation resources:

Inclusion Statement: In our class we strive to create an environment where everyone willing to do their part can learn and thrive. You should always feel free to ask a question: asking and pondering questions is how we learn. Being confused is unfailingly an opportunity to advance our knowledge. Please, commit to helping create a climate where we treat everyone with dignity and respect. Listening to different viewpoints and approaches enriches our experience, and it is up to us to be sure others feel safe to contribute. Creating an environment where we are all comfortable learning is everyone's job: offer support and seek help from others if you need it, not only in class but also outside class while working with classmates.

Grading
Late submission policy: Reviews must be submitted before the lecture starts in order to be graded. You can skip up to 2 reviews without losing points; otherwise 1% of total grade (up to 15%) is deducted for each missing review. Please discuss with the instructor if you cannot submit project proposal or report before the deadline.


Schedule

Lec# Date Topic Reading Slides
1 Wed 9/8 Introduction None L1
Query Processing and Buffer Management
2 Mon 9/13 Join Leonard Shapiro, Join Processing in Database Systems with Large Main Memories. ACM Transactions on Database Systems, 1986
[optional] Laura Haas, et al., Seeking the Truth About ad hoc Join Costs. JVLDB, 1997
[optional] Jaeyoung Do, Jignesh Patel, Join processing for flash SSDs: remembering past lessons. DaMoN, 2009
L2
3 Wed 9/15 Radix Join Peter Boncz, et al., Database Architecture Optimized for the new Bottleneck: Memory Access. VLDB, 1999
[optional] Spyros Blanas, et al. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs.SIGMOD, 2011
L3
4 Mon 9/20 Buffer Management Hong-Tai Chou, David DeWitt, An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica, 1986
[optional] Jim Gray, Gianfranco R. Putzolu, The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD, 1987
L4
5 Wed 9/22 Buffer with NVM Xinjing Zhou, et al. Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory. SIGMOD, 2021
[optional] Alexander van Renen, et al., Managing Non-Volatile Memory in Database Systems. SIGMOD, 2018
L5
6 Mon 9/27 Query Optimization Patricia G. Selinger, et al., Access Path Selection in a Relational Database Management System. SIGMOD, 1979
[optional] Surajit Chaudhuri, An Overview of Query Optimization in Relational Systems. PODS, 1998
L6
7 Wed 9/29 Distribution Robert Epstein, et al., Distributed Query Processing in a Relational Data Base System. SIGMOD, 1978
[optional] David DeWitt, Jim Gray, Parallel Database Systems: The Future of High Performance Database Processing. Communications of the ACM, 1992
L7
Advanced Transaction Management
8 Mon 10/4 Granularity of Locks Jim Gray, et al., Granularity of Locks and Degrees of Consistency in a Shared Data Base. Modelling in Data Base Management Systems, 1976
L8
9 Wed 10/6 Isolation Hal Berenson, et al., A Critique of ANSI SQL Isolation Levels. SIGMOD Record, 1995
L9
10 Mon 10/11 Optimistic CC H. T. Kung, John T. Robinson, On Optimistic Methods for Concurrency Control. ACM Transactions on Database Systems, 1981
[optional] Per-Ake Larson, et al., High-Performance Concurrency Control Mechanisms for Main-Memory Databases. VLDB, 2011
L10
11 Wed 10/13 Modern OCC Stephen Tu, et al., Speedy transactions in multicore in-memory databases. SOSP, 2013
[optional] Xiangyao Yu, et al., TicToc: Time Traveling Optimistic Concurrency Control. SIGMOD, 2016
L11
12 Mon 10/18 Guest Lecture from Oracle Title: Oracle Database In-Memory and Accelerated Analytic Performance
Abstract: The Oracle Database In-Memory (DBIM) is an industry-first dual format in-memory database that maintains transactional consistent data in both row and columnar formats. This unique architecture enables analytic and OLTP workloads to coexist simultaneously, bringing together the best of both worlds. Algorithms across the database stack have been redesigned to directly process encoded and compressed columnar data at memory bandwidth speeds using SIMD vector processing. In this talk, I will introduce the dual-format architecture of Oracle Database In-Memory, how DBIM is integrated with RDBMS To handle HTAP workload seamlessly, and describe the novel algorithms we invented to improve analytic workload performance. With all the features, In-Memory is able to improve the Star Schema Benchmark by multiple orders of magnitude.
Bio: Weiwei Gong is the Senior Manager of Data and In-Memory Technologies at Oracle. Passionate about hardware software co-design, Weiwei leads a team that builds performance-critical features of Oracle Database In-Memory. Her work has enabled efficient analytic query processing by leveraging emerging hardware technologies. Weiwei earned her M.S. from Renmin University of China, and Ph.D. from UMass Boston, both in Computer Science.
13 Wed 10/20 Blink Tree Philip Lehman, S. Bing Yao, Efficient Locking for Concurrent Operations on B-Trees. ACM Transactions on Database Systems, 1981 L13
14 Mon 10/25 Guest Lecture from Amazon Title: Running Amazon Redshift at scale
Abstract: Amazon Redshift is a high performance, secure, scalable and highly available managed data-warehouse service. In this talk, we explore practical aspects of running Amazon Redshift at scale: providing customers elasticity without compromising performance and availability. We will deep dive into Storage and Compute Elasticity: the two features that enable customers to scale up/down based on their need and better manage their costs. We would like encourage attendees to read below Redshift research papers prior to the talk next week to get some additional background on the Redshift architecture - Amazon Redshift and the Case for Simpler Data Warehouses by Gupta et al https://dl.acm.org/doi/pdf/10.1145/2723372.2742795 The evolution of Amazon Redshift (extended abstract) by Pandis et al. http://vldb.org/pvldb/vol14/p3162-pandis.pdf
Bio: Gokul Soundararajan is a principal engineer at AWS and received a PhD from University of Toronto and has been working in the areas of storage, databases, and analytics. He has published several academic papers in Usenix FAST, VLDB, and SIGMETRICS conferences. He has been at AWS since 2018 and worked on delivering Elastic Resize, Cross-instance Restore, and Redshift ML for Amazon Redshift. Sriram Subramanian is a senior engineer at AWS since 2019. He has contributed to Amazon Redshift Managed Storage's launch and on going development and maintenance. He graduated from University of Wisconsin-Madison with a PhD in storage systems, advised by Prof. Andrea and Prof Remzi Arpaci Dusseau.
15 Wed 10/27 Adaptive Radix Tree Viktor Leis, et al., The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE, 2013
Yandong Mao, et al., Cache Craftiness for Fast Multicore Key-Value Storage. EuroSys, 2012
L15
16 Mon 11/1 Durability Philip Bernstein, et al., Concurrency Control and Recovery in Database Systems, Chapter 6. Addison-wesley, 1987 L16
17 Wed 11/3 ARIES C. Mohan, et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems, 1992 L17
18 Mon 11/8 Two-Phase Commit C. Mohan, et al., Transaction Management in the R* Distributed Database Management System. ACM Transactions on Database Systems, 1986
[optional] Philip Bernstein, et al., Concurrency Control and Recovery in Database Systems, Chapter 7. Addison-wesley, 1987
L18
19 Wed 11/10 Exam review sample1 (F11), sample2 (F20), sample3
20 Mon 11/15 Exam Take home exam.
21 Wed 11/17 Replication Jim Gray, et al., The Dangers of Replication and a Solution. SIGMOD, 1996 L21
22 Mon 11/22 Deterministic DBMS Yi Lu, et al., Aria: A Fast and Practical Deterministic OLTP Database. VLDB, 2020
[optional] Alexander Thomson, et al., Calvin: Fast Distributed Transactions for Partitioned Database Systems. SIGMOD, 2012
L22
Cloud-Native DBMS
23 Wed 11/24 Project Meetings Each group meets with the instructor to discuss the final project.
24 Mon 11/29 Cloud OLTP Donald Kossmann, et al., An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. SIGMOD, 2010
[optional] Matthias Brantner, et al., Building a Database on S3. SIGMOD, 2008
L24
25 Wed 12/1 Amazon Aurora Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017
[optional] Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019
L25
26 Mon 12/6 Snowflake Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016
L26
27 Wed 12/8 Pushdown DBMS Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021
[optional] Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020
L27
28 Mon 12/13 DAWN Workshop
29 Wed 12/15 DAWN Workshop