CS 839 Cloud-Native Database Systems

Lectures: Mon/Wed 2:30pm - 3:45pm
Room: ENGR HALL 2255
Instructor: Xiangyao Yu
Office Hour: Mon 4:00pm - 5:00pm (CS 4361)

Course description

Modern applications are moving to the cloud for global accessibility, elasticity, high availability, and low cost. Databases are one of the foundational technologies for cloud applications. Compared to traditional on-premises databases, cloud-native databases have unique architectures (e.g., storage-disaggregation), embrace heterogeneous hardware technologies (e.g., GPU, CXL, SmartNIC), and face new application scenarios (e.g., serverless, autoscaling). This seminar course covers recent development in cloud-native databases from both industrial deployment and academic research. Each lecture features presentations from the instructor and students, and group discussions. The course has a final group project.

Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor. There is no formal textbook for this course.

Lecture Format: Each lecture focuses on multiple papers under the same topic. Students will read at least one paper from the pool and submit a review to https://wisc-cs839-f23.hotcrp.com before the lecture starts (if you are presenting in a lecture, no need to submit review for that lecture). The lecture includes a mixture of presentations from both the lecturer and the students and concludes with a group discussion. Please signup paper presentation slots following this link.

Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of project ideas created for CS764 in previous years 2020, 2021, and 2022; many of these ideas are related to cloud databases. More project ideas will be posted later in the semester. You are also encouraged to select a project outside of the lists. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous CS764 (sample1, sample2, sample3); the expectation for CS839 projects will be similar to CS764 projects. The presentations will be organized as a workshop. The project has the following deadlines:

Computation resources:

Inclusion Statement: In our class we strive to create an environment where everyone willing to do their part can learn and thrive. You should always feel free to ask a question: asking and pondering questions is how we learn. Being confused is unfailingly an opportunity to advance our knowledge. Please, commit to helping create a climate where we treat everyone with dignity and respect. Listening to different viewpoints and approaches enriches our experience, and it is up to us to be sure others feel safe to contribute. Creating an environment where we are all comfortable learning is everyone's job: offer support and seek help from others if you need it, not only in class but also outside class while working with classmates.

Grading
Late submission policy: Reviews must be submitted before the lecture starts in order to be graded. You can skip up to 2 reviews without losing points; otherwise 1% of total grade is deducted for each missing review. Please discuss with the instructor if you cannot submit project proposal or report before the deadline.


Schedule (tentative)

Lec# Date Topic Reading Slides
1 Wed 9/6 Introduction None L1
Storage Disaggregation
2 Mon 9/11 Aurora Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017 L2
3 Wed 9/13 Snowflake Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016 L3
4 Mon 9/18 Analytical Processing-1 Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021
Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020
Cai, Mengchu, et al. Integrated Querying of SQL database data and S3 data in Amazon Redshift. IEEE Data Eng. Bull. 2018
L4 (L4-1, L4-2)
5 Wed 9/20 Analytical Processing-2 Vuppalapati, Midhul, et al. Building an elastic query engine on disaggregated storage. NSDI, 2020
Melnik, Sergey, et al. Dremel: interactive analysis of web-scale datasets. VLDB, 2010
Melnik, Sergey, et al. Dremel: A decade of interactive SQL analysis at web scale VLDB, 2020
Armbrust, Michael, et al. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. CIDR, 2021
L5 (L5-1, L5-2, L5-3)
6-8 Mon 9/25 to
Mon 10/2
Transactional Processing Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019
Zhou, Jingyu, et al. Foundationdb: A distributed unbundled transactional key value store. SIGMOD, 2021
Corbett, James C., et al. Spanner: Google's globally distributed database. OSDI, 2012
Guo, Zhihan, et al. Cornus: atomic commit for a cloud DBMS with storage disaggregation. VLDB 2022
Peng, Daniel, and Frank Dabek. Large-scale incremental processing using distributed transactions and notifications. OSDI, 2010
Taft, Rebecca, et al. Cockroachdb: The resilient geo-distributed sql database. SIGMOD, 2020
Yang, Zhenkun, et al. OceanBase: a 707 million tpmC distributed relational database system. VLDB, 2022
Cao, Wei, et al. PolarDB-X: An Elastic Distributed Relational Database for Cloud-Native Applications. ICDE, 2022
Lomet, David, et al. Unbundling transaction services in the cloud. CIDR 2009
Serverless
10 Mon 10/9 Project Meetings Meeting with the instructor to discuss the final project.
11-14 Wed 10/11 to
Mon 10/23
Serverless and Function as a Service Gaffney, Kevin P., et al. Sqlite: past, present, and future. VLDB, 2022
Raasveldt, Mark, and Hannes Muhleisen. Duckdb: an embeddable analytical database. SIGMOD, 2019
Perron, Matthew, et al. Starling: A scalable query engine on cloud functions. SIGMOD, 2020
Muller, Ingo, Renato MarroquĂ­n, and Gustavo Alonso. Lambada: Interactive data analytics on cold data using serverless cloud infrastructure. SIGMOD, 2020
Sreekanti, Vikram, et al. Cloudburst: Stateful functions-as-a-service. VLDB, 2020
Hellerstein, Joseph M., et al. Serverless computing: One step forward, two steps back. CIDR, 2019
Jonas, Eric, et al. Cloud programming simplified: A berkeley view on serverless computing. Technical Report No. UCB/EECS-2019-3, 2019
Johann Schleier-Smith. Understanding and Exploring Serverless Cloud Computing (Sections 2.1-2.6). Technical Report No. UCB/EECS-2022-273, 2022
Cao, Wei, et al. Polardb serverless: A cloud native database for disaggregated data centers. SIGMOD, 2021
Arun Ulagaratchagan. Introducing Microsoft Fabric: Data analytics for the era of AI. Blog post, 2023
15 Wed 10/25 DBOS Skiadopoulos, Athinagoras, et al. DBOS: a DBMS-oriented Operating System. VLDB, 2022
Kraft, Peter, et al. Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework. arXiv preprint arXiv:2208.13068, 2022
16 Mon 10/30 Auto-scaling Zhu, Yiwen, et al. Towards Building Autonomous Data Services on Azure. SIGMOD, 2023
Wu, Chenggang, Vikram Sreekanti, and Joseph M. Hellerstein. Autoscaling tiered cloud storage in Anna. VLDB, 2019
17 Wed 11/1 Multi-cloud Chasins, Sarah, et al. The sky above the clouds. arXiv preprint arXiv:2205.07147, 2022
What are public, private, and hybrid clouds?. Microsoft, 2023
Flexible, resilient, secure IT for your hybrid cloud. IBM, 2023
Public Cloud vs Private Cloud vs Hybrid Cloud. MongoDB
18 Mon 11/6 Auto-tuning Van Aken, Dana, et al. Automatic database management system tuning through large-scale machine learning. SIGMOD, 2017
Pavlo, Andrew, et al. Self-Driving Database Management Systems. CIDR, 2017
19 Wed 11/8 Project Meetings Meeting with the instructor to discuss the final project.
20 Mon 11/13 HTAP Prout, Adam, et al. Cloud-Native Transactions and Analytics in SingleStore. SIGMOD, 2022
Yang, Jiacheng, et al. F1 Lightning: HTAP as a Service. VLDB, 2020
Chen, Jianjun, et al. ByteHTAP: bytedance's HTAP system with high data freshness and strong data consistency. VLDB 2022
HTAP: HYBRID TRANSACTIONAL AND ANALYTICAL PROCESSING. Snowflake, 2023
New Hardware
21 Wed 11/15 GPU database Anil Shanbhag, et al., A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. SIGMOD, 2020
Anil Shanbhag, et al. Tile-based Lightweight Integer Compression in GPU. SIGMOD, 2022
Bobbi Yogatama, et al. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. VLDB 2022
22 Mon 11/20 Memory Disaggregation Li, Huaicheng, et al. Pond: CXL-based memory pooling systems for cloud platforms. ASPLOS, 2023
Zhang, Qizhen, et al. Redy: remote dynamic memory cache. VLDB, 2021
23 Wed 11/22 RDMA TBD
24 Mon 11/27 SmartNIC TBD
25-27 Wed 11/29 to
Wed 12/6
TBD
28 Mon 12/11 Project Presentation
29 Wed 12/13 Project Presentation