Lectures: Mon/Wed 1:00pm - 2:15pm
Room: COMP SCI 1221
Instructor: Xiangyao Yu
Office Hour: Mon 2:30pm - 3:30pm (CS 4361)
Teaching Assistant: Keren Chen
This course covers a number of advanced topics in the development of database management systems (DBMS) and the modern applications of databases. The topics discussed include query processing and optimization, advanced access methods, advanced concurrency control and recovery, parallel and distributed data systems, cloud computing for data platforms, and data processing with emerging hardware. The course material will be drawn from a number of papers in the database literature. We will cover one paper per lecture. All students are expected to read the paper before coming to the lecture.
Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.
Reference Textbook: There is no formal textbook for this course. The reading list is a collection of papers. The following two books will be used as references in this course. Note you don't need to buy the books.
Lecture Format: Each lecture focuses on a classic or modern research paper. Students will read the paper and submit a review to https://wisc-cs764-f22.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing.
Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of suggested project topics created in 2020, 2021, and 2022; but you are encouraged to select a project outside of the list. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous years (sample1, sample2, sample3). The presentations are organized as a workshop. The project has the following deadlines:
Computation resources:
Inclusion Statement: In our class we strive to create an environment where everyone willing to do their part can learn and thrive. You should always feel free to ask a question: asking and pondering questions is how we learn. Being confused is unfailingly an opportunity to advance our knowledge. Please, commit to helping create a climate where we treat everyone with dignity and respect. Listening to different viewpoints and approaches enriches our experience, and it is up to us to be sure others feel safe to contribute. Creating an environment where we are all comfortable learning is everyone's job: offer support and seek help from others if you need it, not only in class but also outside class while working with classmates.
Lec# | Date | Topic | Reading | Slides |
---|---|---|---|---|
1 | Wed 9/7 | Introduction | None | L1 (notes) |
Query Processing and Buffer Management | ||||
2 | Mon 9/12 | Join |
Leonard Shapiro, Join Processing in Database Systems with Large Main Memories. ACM Transactions on Database Systems, 1986
[optional] Laura Haas, et al., Seeking the Truth About ad hoc Join Costs. JVLDB, 1997 [optional] Jaeyoung Do, Jignesh Patel, Join processing for flash SSDs: remembering past lessons. DaMoN, 2009 |
L2 (notes) |
3 | Wed 9/14 | Radix Join |
Peter Boncz, et al., Database Architecture Optimized for the new Bottleneck: Memory Access. VLDB, 1999
[optional] Spyros Blanas, et al. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs.SIGMOD, 2011 |
L3 (notes) |
4 | Mon 9/19 | Buffer Management |
Hong-Tai Chou, David DeWitt, An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica, 1986
[optional] Jim Gray, Gianfranco R. Putzolu, The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD, 1987 [optional] Alexander van Renen, et al., Managing Non-Volatile Memory in Database Systems. SIGMOD, 2018 |
L4 (notes) |
5 | Wed 9/21 | Modern Buffer Management |
Viktor Leis, et al., LeanStore: In-Memory Data Management Beyond Main Memory. ICDE 2018
[optional] Justin DeBrabant, et al., Anti-Caching: A New Approach to Database Management System Architecture. VLDB, 2013 [optional] Ahmed Eldawy, et al., Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. VLDB 2014 |
L5 (notes) |
6 | Mon 9/26 | Query Optimization |
Patricia G. Selinger, et al., Access Path Selection in a Relational Database Management System. SIGMOD, 1979
[optional] Surajit Chaudhuri, An Overview of Query Optimization in Relational Systems. PODS, 1998 |
L6 (notes) |
7 | Wed 9/28 | Column Store |
Mike Stonebraker, et al. C-store: a column-oriented DBMS, VLDB 2005
[optional] Daniel Abadi, et al., Column-stores vs. row-stores: how different are they really?, SIGMOD 2008 |
L7 (notes) |
8 | Mon 10/3 | Parallel Database |
David DeWitt, Jim Gray, Parallel Database Systems: The Future of High Performance Database Processing. Communications of the ACM, 1992
[optional] Robert Epstein, et al., Distributed Query Processing in a Relational Data Base System. SIGMOD, 1978 |
L8 (notes) |
Advanced Transaction Management | ||||
9 | Wed 10/5 | Granularity of Locks |
Jim Gray, et al., Granularity of Locks and Degrees of Consistency in a Shared Data Base. Modelling in Data Base Management Systems, 1976 |
L9 (notes) |
10 | Mon 10/10 | Isolation |
Hal Berenson, et al., A Critique of ANSI SQL Isolation Levels. SIGMOD Record, 1995 |
L10 (notes) |
11 | Wed 10/12 | Guest lecture from PingCAP | Title: The present and future of TiDB Abstract: Ed will present TiDB's architecture and its architecture evolution philosophy, how TiDB answers the challenges brought by OLTP/OLAP/HTAP workloads in the era of cloud computing. Bio: Ed Huang is the co-founder & CTO of PingCAP, a distributed system expert, executive member of CCF Database Committee, Open Source Development Committee and Big Data Committee. He is an active open source enthusiast and open source software author, whose representative work includes Codis, a distributed Redis caching solution, and TiDB, a distributed relational database. He is one of the "Top 10 Outstanding Contributors to Open Source in China in 2020" and one of the "OSCAR Open Source Vanguards". His first-author paper, TiDB: A Raft-based HTAP Database, is the first paper in the industry on the Raft-based implementation of a real-time HTAP distributed database. Round-table discussion: 2:30-3:30pm, room CS 4310 |
|
12 | Mon 10/17 | Optimistic CC |
H. T. Kung, John T. Robinson, On Optimistic Methods for Concurrency Control. ACM Transactions on Database Systems, 1981
[optional] Per-Ake Larson, et al., High-Performance Concurrency Control Mechanisms for Main-Memory Databases. VLDB, 2011 |
L12 |
13 | Wed 10/19 | Modern OCC |
Stephen Tu, et al., Speedy transactions in multicore in-memory databases. SOSP, 2013
[optional] Xiangyao Yu, et al., TicToc: Time Traveling Optimistic Concurrency Control. SIGMOD, 2016 |
L13 (notes) |
14 | Mon 10/24 | Guest Lecture from Oracle |
Title: Dream the Stream : High Velocity Event Processing with a Converged Database Abstract: Event stream processing is a rapidly growing category of workloads including IoT, Timeseries, Clickstream, Quality Control, Security, Auditing, Metrics, and Monitoring, etc. Analysts estimate the market to grow to $4B USD by 2027! One industry trend has been to use purpose-built stream processing engines for these workloads. This approach, however, sacrifices most of the advantages of an industrial-strength database platform. In this talk, we'll discuss the key aspects of streaming workloads and the requirements of effective stream processing engines, and then show how the many capabilities of the Oracle Database, such as Native JSON support, RAC, Parallel Query, ILM (Information Lifecycle Management) Policies, In-Memory Columnar processing and Advanced Analytics, come together to provide an ideal streaming architecture on a converged database. Bio: Shasank Chavan is the Vice President of the Data and In-Memory Technologies group at Oracle. He leads a team of brilliant engineers in the Database organization who develop customer-facing, performance-critical features for an In-Memory Columnar Store which, as Larry Ellison proclaimed, "processes data at ungodly speeds". His team is currently building Oracle's next-generation, highly distributed, data storage engine that powers the cloud. Shasank earned his BS/MS in Computer Science at the University of California, San Diego. He has accumulated 30+ patents over a span of 23 years working on systems software technology. |
L14 (video) |
15 | Wed 10/26 | Blink Tree |
Philip Lehman, S. Bing Yao, Efficient Locking for Concurrent Operations on B-Trees. ACM Transactions on Database Systems, 1981
[optional] Viktor Leis, et al. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull. 2019 |
L15 (notes) |
16 | Mon 10/31 | Adaptive Radix Tree |
Viktor Leis, et al., The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE, 2013
[optional] Yandong Mao, et al., Cache Craftiness for Fast Multicore Key-Value Storage. EuroSys, 2012 |
L16 (notes) |
17 | Wed 11/2 | ARIES |
C. Mohan, et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems, 1992
[optional] Philip Bernstein, et al., Concurrency Control and Recovery in Database Systems, Chapter 6. Addison-wesley, 1987 |
L17 (notes) |
18 | Mon 11/7 | Exam review | sample1 (F11), sample2 (F20), sample3 (F21), | |
19 | Wed 11/9 | Exam | ||
20 | Mon 11/14 | Two-Phase Commit | C. Mohan, et al., Transaction Management in the R* Distributed Database Management System. ACM Transactions on Database Systems, 1986 | L20 (notes) |
Cloud-Native DBMS | ||||
21 | Wed 11/16 | Cornus |
Zhihan Guo, et al., Cornus: Atomic Commit for a Cloud DBMS with Storage Disaggregation. arXiv 2102.10185, 2022
[optional] Gray, Jim, and Leslie Lamport. Consensus on transaction commit ACM Transactions on Database Systems (TODS) 31.1 (2006): 133-160. |
L21 (notes) |
22 | Mon 11/21 | Deterministic DBMS |
Yi Lu, et al., Aria: A Fast and Practical Deterministic OLTP Database. VLDB, 2020
[optional] Alexander Thomson, et al., Calvin: Fast Distributed Transactions for Partitioned Database Systems. SIGMOD, 2012 |
L22 |
23 | Wed 11/23 | Project Meetings | Each group meets with the instructor to discuss the final project. | |
24 | Mon 11/28 | Amazon Aurora |
Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017
[optional] Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019 |
L24 (notes) |
25 | Wed 11/30 | Snowflake |
Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016
[optional] Midhul Vuppalapati, et al., Building An Elastic Query Engine on Disaggregated Storage. NSDI, 2020 |
L25 (notes) |
26 | Mon 12/5 | Pushdown DBMS |
Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021
[optional] Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020 |
L26 (notes) |
27 | Wed 12/7 | GPU Database |
Anil Shanbhag, et al., A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. SIGMOD, 2020
[optional] Anil Shanbhag, et al. Tile-based Lightweight Integer Compression in GPU. SIGMOD, 2022 [optional] Bobbi Yogatama, et al. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. VLDB 2022 |
L27 (notes) |
28 | Mon 12/12 | DAWN Workshop |
|
|
29 | Wed 12/14 | DAWN Workshop |
|