CS 839 Cloud-Native Database Systems

Lectures: Mon/Wed 2:30pm - 3:45pm
Room: ENGR HALL 2255
Instructor: Xiangyao Yu
Office Hour: Mon 4:00pm - 5:00pm (CS 4361)

Course description

Modern applications are moving to the cloud for global accessibility, elasticity, high availability, and low cost. Databases are one of the foundational technologies for cloud applications. Compared to traditional on-premises databases, cloud-native databases have unique architectures (e.g., storage-disaggregation), embrace heterogeneous hardware technologies (e.g., GPU, CXL, SmartNIC), and face new application scenarios (e.g., serverless, autoscaling). This seminar course covers recent development in cloud-native databases from both industrial deployment and academic research. Each lecture features presentations from the instructor and students, and group discussions. The course has a final group project.

Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor. There is no formal textbook for this course.

Lecture Format: Each lecture focuses on multiple papers under the same topic. Students will read at least one paper from the pool and submit a review to https://wisc-cs839-f23.hotcrp.com before the lecture starts (if you are presenting in a lecture, no need to submit review for that lecture). The lecture includes a mixture of presentations from both the lecturer and the students and concludes with a group discussion. Please signup paper presentation slots following this link.

Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of project ideas created for CS764 in previous years 2020, 2021, and 2022; many of these ideas are related to cloud databases. Here is a new list of ideas created for this course. You are also encouraged to select a project outside of the lists. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous CS764 (sample1, sample2, sample3); the expectation for CS839 projects will be similar to CS764 projects. The presentations will be organized as a workshop. The project has the following deadlines:

Computation resources:

Inclusion Statement: In our class we strive to create an environment where everyone willing to do their part can learn and thrive. You should always feel free to ask a question: asking and pondering questions is how we learn. Being confused is unfailingly an opportunity to advance our knowledge. Please, commit to helping create a climate where we treat everyone with dignity and respect. Listening to different viewpoints and approaches enriches our experience, and it is up to us to be sure others feel safe to contribute. Creating an environment where we are all comfortable learning is everyone's job: offer support and seek help from others if you need it, not only in class but also outside class while working with classmates.

Grading
Late submission policy: Reviews must be submitted before the lecture starts in order to be graded. You can skip up to 2 reviews without losing points; otherwise 1% of total grade is deducted for each missing review. Please discuss with the instructor if you cannot submit project proposal or report before the deadline.


Schedule (tentative)

Lec# Date Topic Reading Slides
1 Wed 9/6 Introduction None L1
Storage Disaggregation
2 Mon 9/11 Aurora Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017 L2
3 Wed 9/13 Snowflake Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016 L3
4 Mon 9/18 Analytical Processing-1 Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021
Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020
Cai, Mengchu, et al. Integrated Querying of SQL database data and S3 data in Amazon Redshift. IEEE Data Eng. Bull. 2018
L4 (L4-1, L4-2)
5 Wed 9/20 Analytical Processing-2 Vuppalapati, Midhul, et al. Building an elastic query engine on disaggregated storage. NSDI, 2020
Melnik, Sergey, et al. Dremel: interactive analysis of web-scale datasets. VLDB, 2010
Melnik, Sergey, et al. Dremel: A decade of interactive SQL analysis at web scale VLDB, 2020
Armbrust, Michael, et al. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. CIDR, 2021
L5 (L5-1, L5-2, L5-3)
6 Mon 9/25 Guest Lecture Title: S3: an overview of the internal architecture
Abstract: S3 is a highly scalable and highly durable object store. In this talk we will present the high-level architecture of S3 and dive into how the storage system achieves these goals while keeping the costs low. We will talk about some of the design philosophies behind S3 and give primer on Reed Solomon Erasure codes.
Bio: Jaso has been a software developer for many years. He earned his BS in Computer Science at UW-Madison and his masters at Johns Hopkins. He currently is in his second year as a PhD. student at UW-Madison. Before returning to academia, he spent 17 years at Amazon.com working on various systems from S3, DynamoDB, Timestream and AWS-IOT to name a few.
7 Wed 9/27 Transaction Processing-1 Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019
Corbett, James C., et al. Spanner: Google's globally distributed database. OSDI, 2012
Lomet, David, et al. Unbundling transaction services in the cloud. CIDR 2009
L7 (L7-1, L7-2)
8 Mon 10/2 Transaction Processing-2 Zhou, Jingyu, et al. Foundationdb: A distributed unbundled transactional key value store. SIGMOD, 2021
Guo, Zhihan, et al. Cornus: atomic commit for a cloud DBMS with storage disaggregation. VLDB 2022
Peng, Daniel, and Frank Dabek. Large-scale incremental processing using distributed transactions and notifications. OSDI, 2010
L8 (L8-1, L8-2)
9 Wed 10/4 Transaction Processing-3 Taft, Rebecca, et al. Cockroachdb: The resilient geo-distributed sql database. SIGMOD, 2020
Yang, Zhenkun, et al. OceanBase: a 707 million tpmC distributed relational database system. VLDB, 2022
Cao, Wei, et al. PolarDB-X: An Elastic Distributed Relational Database for Cloud-Native Applications. ICDE, 2022
L9 (L9-1, L9-2, L9-3)
Serverless
10 Mon 10/9 Project Meetings Meeting with the instructor to discuss the course project.
11 Wed 10/11 Database Affiliates Workshop Attend the Wisconsin Database Affiliates Workshop on 10/12 (optional) and 10/13 (required).
12 Mon 10/16 Serverless-1 Gaffney, Kevin P., et al. Sqlite: past, present, and future. VLDB, 2022
Raasveldt, Mark, and Hannes Muhleisen. Duckdb: an embeddable analytical database. SIGMOD, 2019
L12 (L12-1, L12-2)
13 Wed 10/18 Serverless-2 Perron, Matthew, et al. Starling: A scalable query engine on cloud functions. SIGMOD, 2020
Muller, Ingo, Renato MarroquĂ­n, and Gustavo Alonso. Lambada: Interactive data analytics on cold data using serverless cloud infrastructure. SIGMOD, 2020
L13 (L13-1, L13-2)
14 Mon 10/23 Serverless-3 Sreekanti, Vikram, et al. Cloudburst: Stateful functions-as-a-service. VLDB, 2020
Hellerstein, Joseph M., et al. Serverless computing: One step forward, two steps back. CIDR, 2019
Arun Ulagaratchagan. Introducing Microsoft Fabric: Data analytics for the era of AI. Blog post, 2023
L14 (L14-1, L14-2)
15 Wed 10/25 Serverless-4 Johann Schleier-Smith. Understanding and Exploring Serverless Cloud Computing (Sections 2.1-2.6). Technical Report No. UCB/EECS-2022-273, 2022
Cao, Wei, et al. Polardb serverless: A cloud native database for disaggregated data centers. SIGMOD, 2021
Jonas, Eric, et al. Cloud programming simplified: A berkeley view on serverless computing. Technical Report No. UCB/EECS-2019-3, 2019
L15 (L15-1, L15-2, L15-3)
16 Mon 10/30 DBOS Skiadopoulos, Athinagoras, et al. DBOS: a DBMS-oriented Operating System. VLDB, 2022
Kraft, Peter, et al. Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework. arXiv preprint arXiv:2208.13068, 2022
Cafarella, Michael, et al. DBOS: A proposal for a data-centric operating system. arXiv preprint arXiv:2007.11112, 2020
Li, Qian, et al. R3: Record-Replay-Retroaction for Database-Backed Applications VLDB, 2023
L16 (L16-1, L16-2, L16-3)
17 Wed 11/1 Auto-scaling Zhu, Yiwen, et al. Towards Building Autonomous Data Services on Azure. SIGMOD, 2023
Wu, Chenggang, Vikram Sreekanti, and Joseph M. Hellerstein. Autoscaling tiered cloud storage in Anna. VLDB, 2019
Poppe, Olga, et al. Moneyball: proactive auto-scaling in Microsoft Azure SQL database serverless. VLDB, 2022
Das, Sudipto, et al. Albatross: Lightweight elasticity in shared storage databases for the cloud using live data migration. VLDB, 2011
L17 (L17-1, L17-2, L17-3)
18 Mon 11/6 Guest Lecture Title: Build an open source, high performance, cloud native time series database
Abstract: With the growth of IoT and industrial Internet, time series databases have become more and more popular. Based on the characteristics of time series data, the TDengine team proposed a unique data model of "one table for one data collection point". Benchmark results show that this model dramatically boosts database performance in terms of data ingestion rate, query latency and data compression ratio. In addition, through another innovative concept called the "Super Table", TDengine makes aggregating millions of tables very efficient.
Through its native distributed design, storage and computing separation, and RAFT-based data replication, TDengine provides very good scalability, elasticity, and resilience. It can support over one billion connected devices and 100 nodes without any performance deterioration. And with its good observability and cloud deployment tools, TDengine is a true cloud native time series database.
TDengine was open sourced in 2019, and the cloud native edition was open sourced in 2022. At present, it has gained over 21,000 stars on GitHub and over 400,000 installations from over 50 countries. It has been widely used in smart manufacturing, clean energy, oil/gas, mining, connected vehicles and more industries.
Bio: Jeff Tao is the founder and CEO of TDengine. He has a background as a technologist and serial entrepreneur, having previously conducted research and development on mobile Internet at Motorola and 3Com and established two successful tech startups. Foreseeing the explosive growth of time-series data generated by machines and sensors now taking place, he founded TDengine in May 2017 to develop an open source, high performance, cloud native time series database purpose-built for modern Industry 4.0 and Industrial IoT businesses.
19 Wed 11/8 Multi-cloud Chasins, Sarah, et al. The sky above the clouds. arXiv preprint arXiv:2205.07147, 2022
Durner, Dominik, Viktor Leis, and Thomas Neumann. Exploiting Cloud Object Storage for High-Performance Analytics. VLDB, 2023
Jain, Paras, et al. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays. NSDI, 2023
Yang, Zongheng, et al. SkyPilot: An Intercloud Broker for Sky Computing. NSDI, 2023
What are public, private, and hybrid clouds?. Microsoft, 2023
Flexible, resilient, secure IT for your hybrid cloud. IBM, 2023
Public Cloud vs Private Cloud vs Hybrid Cloud. MongoDB
L19 (L19-1, L19-2, L19-3)
20 Mon 11/13 Project Meetings Meeting with the instructor to discuss the course project.
21 Wed 11/15 Auto-tuning Van Aken, Dana, et al. Automatic database management system tuning through large-scale machine learning. SIGMOD, 2017
Pavlo, Andrew, et al. Self-Driving Database Management Systems. CIDR, 2017
Kanellis, Konstantinos, et al. LlamaTune: Sample-Efficient DBMS Configuration Tuning. VLDB, 2022
L21 (L21-1, L21-2, L21-3)
22 Mon 11/20 HTAP Prout, Adam, et al. Cloud-Native Transactions and Analytics in SingleStore. SIGMOD, 2022
Yang, Jiacheng, et al. F1 Lightning: HTAP as a Service. VLDB, 2020
Chen, Jianjun, et al. ByteHTAP: bytedance's HTAP system with high data freshness and strong data consistency. VLDB 2022
Huang, Dongxu, et al. TiDB: a Raft-based HTAP database. VLDB, 2020
HTAP: HYBRID TRANSACTIONAL AND ANALYTICAL PROCESSING. Snowflake, 2023
L22 (L22-1, L22-2, L22-3, L22-4)
New Hardware
23 Wed 11/22 GPU database Anil Shanbhag, et al., A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. SIGMOD, 2020
Anil Shanbhag, et al. Tile-based Lightweight Integer Compression in GPU. SIGMOD, 2022
Bobbi Yogatama, et al. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. VLDB 2022
Cao, Jiashen, et al. Revisiting Query Performance in GPU Database Systems. arXiv 2023
L23 (L23-1, L23-2, L23-3, L23-4)
24 Mon 11/27 Memory Disaggregation Li, Huaicheng, et al. Pond: CXL-based memory pooling systems for cloud platforms. ASPLOS, 2023
Zhang, Qizhen, et al. Redy: remote dynamic memory cache. VLDB, 2021
Wang, Ruihong, et al. The case for distributed shared-memory databases with RDMA-enabled memory disaggregation. VLDB, 2023
Zhang, Qizhen, et al. Compucache: Remote computable caching using spot vms. CIDR, 2022
Lim, Kevin, et al. Disaggregated memory for expansion and sharing in blade servers. ISCA, 2009
L24 (L24-1, L24-2, L24-3)
25 Wed 11/29 RDMA Binnig, Carsten, et al. The end of slow networks: It's time for a redesign. VLDB, 2016
Zamanian, Erfan, et al. The end of a myth: Distributed transactions can scal. VLDB, 2017
Barthels, Claude, et al. Rack-scale in-memory join processing using RDMA. SIGMOD, 2015
Rodiger, Wolf, et al. High-speed query processing over high-speed networks. VLDB, 2015
L25 (L25-1, L25-2, L25-3)
26 Mon 12/4 SmartNIC Lin, Jiaxin, et al. Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs. SIGMETRICS, 2023
Liu, Ming, et al. Offloading distributed applications onto smartnics using ipipe. SIGCOMM, 2019
Schuh, Henry N., et al. Xenic: SmartNIC-accelerated distributed transactions. SOSP, 2021
L26 (L26-1, L26-2, L26-3)
27 Wed 12/6 No Lecture
28 Mon 12/11 Project Presentation
29 Wed 12/13 Project Presentation