CS 764 Topics in Database Management Systems

Lectures: Mon/Wed 1:00pm - 2:15pm
Room: COMP SCI 1221
Instructor: Xiangyao Yu
Office Hour: Mon 2:30pm - 3:30pm (CS 4361)
Teaching Assistant: Keren Chen

Course description

This course covers a number of advanced topics in the development of database management systems (DBMS) and the modern applications of databases. The topics discussed include query processing and optimization, advanced access methods, advanced concurrency control and recovery, parallel and distributed data systems, cloud computing for data platforms, and data processing with emerging hardware. The course material will be drawn from a number of papers in the database literature. We will cover one paper per lecture. All students are expected to read the paper before coming to the lecture.

Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.

Reference Textbook: There is no formal textbook for this course. The reading list is a collection of papers. The following two books will be used as references in this course. Note you don't need to buy the books.

Lecture Format: Each lecture focuses on a classic or modern research paper. Students will read the paper and submit a review to https://wisc-cs764-f22.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing.

Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of suggested project topics created in 2020, 2021, and 2022; but you are encouraged to select a project outside of the list. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous years (sample1, sample2, sample3). The presentations are organized as a workshop. The project has the following deadlines:

Computation resources:

Inclusion Statement: In our class we strive to create an environment where everyone willing to do their part can learn and thrive. You should always feel free to ask a question: asking and pondering questions is how we learn. Being confused is unfailingly an opportunity to advance our knowledge. Please, commit to helping create a climate where we treat everyone with dignity and respect. Listening to different viewpoints and approaches enriches our experience, and it is up to us to be sure others feel safe to contribute. Creating an environment where we are all comfortable learning is everyone's job: offer support and seek help from others if you need it, not only in class but also outside class while working with classmates.

Grading
Late submission policy: Reviews must be submitted before the lecture starts in order to be graded. You can skip up to 2 reviews without losing points; otherwise 1% of total grade (up to 15%) is deducted for each missing review. Please discuss with the instructor if you cannot submit project proposal or report before the deadline.


Schedule

Lec# Date Topic Reading Slides
1 Wed 9/7 Introduction None L1 (notes)
Query Processing and Buffer Management
2 Mon 9/12 Join Leonard Shapiro, Join Processing in Database Systems with Large Main Memories. ACM Transactions on Database Systems, 1986
[optional] Laura Haas, et al., Seeking the Truth About ad hoc Join Costs. JVLDB, 1997
[optional] Jaeyoung Do, Jignesh Patel, Join processing for flash SSDs: remembering past lessons. DaMoN, 2009
L2 (notes)
3 Wed 9/14 Radix Join Peter Boncz, et al., Database Architecture Optimized for the new Bottleneck: Memory Access. VLDB, 1999
[optional] Spyros Blanas, et al. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs.SIGMOD, 2011
L3 (notes)
4 Mon 9/19 Buffer Management Hong-Tai Chou, David DeWitt, An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica, 1986
[optional] Jim Gray, Gianfranco R. Putzolu, The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD, 1987
[optional] Alexander van Renen, et al., Managing Non-Volatile Memory in Database Systems. SIGMOD, 2018
L4 (notes)
5 Wed 9/21 Modern Buffer Management Viktor Leis, et al., LeanStore: In-Memory Data Management Beyond Main Memory. ICDE 2018
[optional] Justin DeBrabant, et al., Anti-Caching: A New Approach to Database Management System Architecture. VLDB, 2013
[optional] Ahmed Eldawy, et al., Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. VLDB 2014
L5 (notes)
6 Mon 9/26 Query Optimization Patricia G. Selinger, et al., Access Path Selection in a Relational Database Management System. SIGMOD, 1979
[optional] Surajit Chaudhuri, An Overview of Query Optimization in Relational Systems. PODS, 1998
L6 (notes)
7 Wed 9/28 Column Store Mike Stonebraker, et al. C-store: a column-oriented DBMS, VLDB 2005
[optional] Daniel Abadi, et al., Column-stores vs. row-stores: how different are they really?, SIGMOD 2008
L7 (notes)
8 Mon 10/3 Parallel Database David DeWitt, Jim Gray, Parallel Database Systems: The Future of High Performance Database Processing. Communications of the ACM, 1992
[optional] Robert Epstein, et al., Distributed Query Processing in a Relational Data Base System. SIGMOD, 1978
L8 (notes)
Advanced Transaction Management
9 Wed 10/5 Granularity of Locks Jim Gray, et al., Granularity of Locks and Degrees of Consistency in a Shared Data Base. Modelling in Data Base Management Systems, 1976
L9 (notes)
10 Mon 10/10 Isolation Hal Berenson, et al., A Critique of ANSI SQL Isolation Levels. SIGMOD Record, 1995
L10 (notes)
11 Wed 10/12 Guest lecture from PingCAP Title: The present and future of TiDB
Abstract: Ed will present TiDB's architecture and its architecture evolution philosophy, how TiDB answers the challenges brought by OLTP/OLAP/HTAP workloads in the era of cloud computing.
Bio: Ed Huang is the co-founder & CTO of PingCAP, a distributed system expert, executive member of CCF Database Committee, Open Source Development Committee and Big Data Committee. He is an active open source enthusiast and open source software author, whose representative work includes Codis, a distributed Redis caching solution, and TiDB, a distributed relational database. He is one of the "Top 10 Outstanding Contributors to Open Source in China in 2020" and one of the "OSCAR Open Source Vanguards". His first-author paper, TiDB: A Raft-based HTAP Database, is the first paper in the industry on the Raft-based implementation of a real-time HTAP distributed database.
Round-table discussion: 2:30-3:30pm, room CS 4310
12 Mon 10/17 Optimistic CC H. T. Kung, John T. Robinson, On Optimistic Methods for Concurrency Control. ACM Transactions on Database Systems, 1981
[optional] Per-Ake Larson, et al., High-Performance Concurrency Control Mechanisms for Main-Memory Databases. VLDB, 2011
L12
13 Wed 10/19 Modern OCC Stephen Tu, et al., Speedy transactions in multicore in-memory databases. SOSP, 2013
[optional] Xiangyao Yu, et al., TicToc: Time Traveling Optimistic Concurrency Control. SIGMOD, 2016
L13 (notes)
14 Mon 10/24 Guest Lecture from Oracle Title: Dream the Stream : High Velocity Event Processing with a Converged Database
Abstract: Event stream processing is a rapidly growing category of workloads including IoT, Timeseries, Clickstream, Quality Control, Security, Auditing, Metrics, and Monitoring, etc. Analysts estimate the market to grow to $4B USD by 2027! One industry trend has been to use purpose-built stream processing engines for these workloads. This approach, however, sacrifices most of the advantages of an industrial-strength database platform. In this talk, we'll discuss the key aspects of streaming workloads and the requirements of effective stream processing engines, and then show how the many capabilities of the Oracle Database, such as Native JSON support, RAC, Parallel Query, ILM (Information Lifecycle Management) Policies, In-Memory Columnar processing and Advanced Analytics, come together to provide an ideal streaming architecture on a converged database.
Bio: Shasank Chavan is the Vice President of the Data and In-Memory Technologies group at Oracle. He leads a team of brilliant engineers in the Database organization who develop customer-facing, performance-critical features for an In-Memory Columnar Store which, as Larry Ellison proclaimed, "processes data at ungodly speeds". His team is currently building Oracle's next-generation, highly distributed, data storage engine that powers the cloud. Shasank earned his BS/MS in Computer Science at the University of California, San Diego. He has accumulated 30+ patents over a span of 23 years working on systems software technology.
L14 (video)
15 Wed 10/26 Blink Tree Philip Lehman, S. Bing Yao, Efficient Locking for Concurrent Operations on B-Trees. ACM Transactions on Database Systems, 1981
[optional] Viktor Leis, et al. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull. 2019
L15 (notes)
16 Mon 10/31 Adaptive Radix Tree Viktor Leis, et al., The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE, 2013
[optional] Yandong Mao, et al., Cache Craftiness for Fast Multicore Key-Value Storage. EuroSys, 2012
L16 (notes)
17 Wed 11/2 ARIES C. Mohan, et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems, 1992
[optional] Philip Bernstein, et al., Concurrency Control and Recovery in Database Systems, Chapter 6. Addison-wesley, 1987
L17 (notes)
18 Mon 11/7 Exam review sample1 (F11), sample2 (F20), sample3 (F21),
19 Wed 11/9 Exam
20 Mon 11/14 Two-Phase Commit C. Mohan, et al., Transaction Management in the R* Distributed Database Management System. ACM Transactions on Database Systems, 1986 L20 (notes)
Cloud-Native DBMS
21 Wed 11/16 Cornus Zhihan Guo, et al., Cornus: Atomic Commit for a Cloud DBMS with Storage Disaggregation. arXiv 2102.10185, 2022
[optional] Gray, Jim, and Leslie Lamport. Consensus on transaction commit ACM Transactions on Database Systems (TODS) 31.1 (2006): 133-160.
L21 (notes)
22 Mon 11/21 Deterministic DBMS Yi Lu, et al., Aria: A Fast and Practical Deterministic OLTP Database. VLDB, 2020
[optional] Alexander Thomson, et al., Calvin: Fast Distributed Transactions for Partitioned Database Systems. SIGMOD, 2012
L22
23 Wed 11/23 Project Meetings Each group meets with the instructor to discuss the final project.
24 Mon 11/28 Amazon Aurora Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017
[optional] Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019
L24 (notes)
25 Wed 11/30 Snowflake Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016
[optional] Midhul Vuppalapati, et al., Building An Elastic Query Engine on Disaggregated Storage. NSDI, 2020
L25 (notes)
26 Mon 12/5 Pushdown DBMS Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021
[optional] Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020
L26 (notes)
27 Wed 12/7 GPU Database Anil Shanbhag, et al., A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. SIGMOD, 2020
[optional] Anil Shanbhag, et al. Tile-based Lightweight Integer Compression in GPU. SIGMOD, 2022
[optional] Bobbi Yogatama, et al. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. VLDB 2022
L27 (notes)
28 Mon 12/12 DAWN Workshop
29 Wed 12/14 DAWN Workshop