CS 839 Cloud-Native Database Systems

Lectures: Mon/Wed 2:30pm - 3:45pm
Room: ENGR HALL 2255
Instructor: Xiangyao Yu
Office Hour: Mon 4:00pm - 5:00pm (CS 4361)

Modern applications are moving to the cloud for global accessibility, elasticity, high availability, and low cost. Databases are one of the foundational technologies for cloud applications. Compared to traditional on-premises databases, cloud-native databases have unique architectures (e.g., storage-disaggregation), embrace heterogeneous hardware technologies (e.g., GPU, CXL, SmartNIC), and face new application scenarios (e.g., serverless, autoscaling). This seminar course covers recent development in cloud-native databases from both industrial deployment and academic research. Each lecture features presentations from the instructor and students, and group discussions. The course has a final group project.

Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor. There is no formal textbook for this course.

Lecture Format: Each lecture focuses on multiple papers under the same topic. Students will read at least one paper from the pool and submit a review to https://wisc-cs839-f23.hotcrp.com before the lecture starts (if you are presenting in a lecture, no need to submit review for that lecture). The lecture includes a mixture of presentations from both the lecturer and the students and concludes with a group discussion. Please signup paper presentation slots following this link.

Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of project ideas created for CS764 in previous years 2020, 2021, and 2022; many of these ideas are related to cloud databases. Here is a new list of ideas created for this course. You are also encouraged to select a project outside of the lists. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous CS764 (sample1, sample2, sample3); the expectation for CS839 projects will be similar to CS764 projects. The presentations will be organized as a workshop. The project has the following deadlines:

Proposal due: Oct. 16 20
Presentation: Dec. 11 & 13
Paper submission: Dec. 18

Computation resources:

CloudLab: https://www.cloudlab.us/signup.php?pid=NextGenDB (project name: NextGenDB)
Chameleon: https://www.chameleoncloud.org (project name: ngdb)

Inclusion Statement: In our class we strive to create an environment where everyone willing to do their part can learn and thrive. You should always feel free to ask a question: asking and pondering questions is how we learn. Being confused is unfailingly an opportunity to advance our knowledge. Please, commit to helping create a climate where we treat everyone with dignity and respect. Listening to different viewpoints and approaches enriches our experience, and it is up to us to be sure others feel safe to contribute. Creating an environment where we are all comfortable learning is everyone's job: offer support and seek help from others if you need it, not only in class but also outside class while working with classmates.

Paper review: 25%
Class participation: 25%
Project proposal: 10%
Project presentation: 10%
Project final report: 30%

Late submission policy: Reviews must be submitted before the lecture starts in order to be graded. You can skip up to 2 reviews without losing points; otherwise 1% of total grade is deducted for each missing review. Please discuss with the instructor if you cannot submit project proposal or report before the deadline.

Lec#	Date	Topic	Reading	Slides
1	Wed 9/6	Introduction	None	L1
		Storage Disaggregation
2	Mon 9/11	Aurora	Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017	L2
3	Wed 9/13	Snowflake	Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016	L3
4	Mon 9/18	Analytical Processing-1	Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021 Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020 Cai, Mengchu, et al. Integrated Querying of SQL database data and S3 data in Amazon Redshift. IEEE Data Eng. Bull. 2018	L4 (L4-1, L4-2)
5	Wed 9/20	Analytical Processing-2	Vuppalapati, Midhul, et al. Building an elastic query engine on disaggregated storage. NSDI, 2020 Melnik, Sergey, et al. Dremel: interactive analysis of web-scale datasets. VLDB, 2010 Melnik, Sergey, et al. Dremel: A decade of interactive SQL analysis at web scale VLDB, 2020 Armbrust, Michael, et al. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. CIDR, 2021	L5 (L5-1, L5-2, L5-3)
6	Mon 9/25	Guest Lecture	Title: S3: an overview of the internal architecture Abstract: S3 is a highly scalable and highly durable object store. In this talk we will present the high-level architecture of S3 and dive into how the storage system achieves these goals while keeping the costs low. We will talk about some of the design philosophies behind S3 and give primer on Reed Solomon Erasure codes. Bio: Jaso has been a software developer for many years. He earned his BS in Computer Science at UW-Madison and his masters at Johns Hopkins. He currently is in his second year as a PhD. student at UW-Madison. Before returning to academia, he spent 17 years at Amazon.com working on various systems from S3, DynamoDB, Timestream and AWS-IOT to name a few.
7	Wed 9/27	Transaction Processing-1	Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019 Corbett, James C., et al. Spanner: Google's globally distributed database. OSDI, 2012 Lomet, David, et al. Unbundling transaction services in the cloud. CIDR 2009	L7 (L7-1, L7-2)
8	Mon 10/2	Transaction Processing-2	Zhou, Jingyu, et al. Foundationdb: A distributed unbundled transactional key value store. SIGMOD, 2021 Guo, Zhihan, et al. Cornus: atomic commit for a cloud DBMS with storage disaggregation. VLDB 2022 Peng, Daniel, and Frank Dabek. Large-scale incremental processing using distributed transactions and notifications. OSDI, 2010	L8 (L8-1, L8-2)
9	Wed 10/4	Transaction Processing-3	Taft, Rebecca, et al. Cockroachdb: The resilient geo-distributed sql database. SIGMOD, 2020 Yang, Zhenkun, et al. OceanBase: a 707 million tpmC distributed relational database system. VLDB, 2022 Cao, Wei, et al. PolarDB-X: An Elastic Distributed Relational Database for Cloud-Native Applications. ICDE, 2022	L9 (L9-1, L9-2, L9-3)
		Serverless
10	Mon 10/9	Project Meetings	Meeting with the instructor to discuss the course project.
11	Wed 10/11	Database Affiliates Workshop	Attend the Wisconsin Database Affiliates Workshop on 10/12 (optional) and 10/13 (required).
12	Mon 10/16	Serverless-1	Gaffney, Kevin P., et al. Sqlite: past, present, and future. VLDB, 2022 Raasveldt, Mark, and Hannes Muhleisen. Duckdb: an embeddable analytical database. SIGMOD, 2019	L12 (L12-1, L12-2)
13	Wed 10/18	Serverless-2	Perron, Matthew, et al. Starling: A scalable query engine on cloud functions. SIGMOD, 2020 Muller, Ingo, Renato Marroquín, and Gustavo Alonso. Lambada: Interactive data analytics on cold data using serverless cloud infrastructure. SIGMOD, 2020	L13 (L13-1, L13-2)
14	Mon 10/23	Serverless-3	Sreekanti, Vikram, et al. Cloudburst: Stateful functions-as-a-service. VLDB, 2020 Hellerstein, Joseph M., et al. Serverless computing: One step forward, two steps back. CIDR, 2019 Arun Ulagaratchagan. Introducing Microsoft Fabric: Data analytics for the era of AI. Blog post, 2023	L14 (L14-1, L14-2)
15	Wed 10/25	Serverless-4	Johann Schleier-Smith. Understanding and Exploring Serverless Cloud Computing (Sections 2.1-2.6). Technical Report No. UCB/EECS-2022-273, 2022 Cao, Wei, et al. Polardb serverless: A cloud native database for disaggregated data centers. SIGMOD, 2021 Jonas, Eric, et al. Cloud programming simplified: A berkeley view on serverless computing. Technical Report No. UCB/EECS-2019-3, 2019	L15 (L15-1, L15-2, L15-3)
16	Mon 10/30	DBOS	Skiadopoulos, Athinagoras, et al. DBOS: a DBMS-oriented Operating System. VLDB, 2022 Kraft, Peter, et al. Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework. arXiv preprint arXiv:2208.13068, 2022 Cafarella, Michael, et al. DBOS: A proposal for a data-centric operating system. arXiv preprint arXiv:2007.11112, 2020 Li, Qian, et al. R3: Record-Replay-Retroaction for Database-Backed Applications VLDB, 2023	L16 (L16-1, L16-2, L16-3)
17	Wed 11/1	Auto-scaling	Zhu, Yiwen, et al. Towards Building Autonomous Data Services on Azure. SIGMOD, 2023 Wu, Chenggang, Vikram Sreekanti, and Joseph M. Hellerstein. Autoscaling tiered cloud storage in Anna. VLDB, 2019 Poppe, Olga, et al. Moneyball: proactive auto-scaling in Microsoft Azure SQL database serverless. VLDB, 2022 Das, Sudipto, et al. Albatross: Lightweight elasticity in shared storage databases for the cloud using live data migration. VLDB, 2011	L17 (L17-1, L17-2, L17-3)
18	Mon 11/6	Guest Lecture	Title: Build an open source, high performance, cloud native time series database Abstract: With the growth of IoT and industrial Internet, time series databases have become more and more popular. Based on the characteristics of time series data, the TDengine team proposed a unique data model of "one table for one data collection point". Benchmark results show that this model dramatically boosts database performance in terms of data ingestion rate, query latency and data compression ratio. In addition, through another innovative concept called the "Super Table", TDengine makes aggregating millions of tables very efficient. Through its native distributed design, storage and computing separation, and RAFT-based data replication, TDengine provides very good scalability, elasticity, and resilience. It can support over one billion connected devices and 100 nodes without any performance deterioration. And with its good observability and cloud deployment tools, TDengine is a true cloud native time series database. TDengine was open sourced in 2019, and the cloud native edition was open sourced in 2022. At present, it has gained over 21,000 stars on GitHub and over 400,000 installations from over 50 countries. It has been widely used in smart manufacturing, clean energy, oil/gas, mining, connected vehicles and more industries. Bio: Jeff Tao is the founder and CEO of TDengine. He has a background as a technologist and serial entrepreneur, having previously conducted research and development on mobile Internet at Motorola and 3Com and established two successful tech startups. Foreseeing the explosive growth of time-series data generated by machines and sensors now taking place, he founded TDengine in May 2017 to develop an open source, high performance, cloud native time series database purpose-built for modern Industry 4.0 and Industrial IoT businesses.
19	Wed 11/8	Multi-cloud	Chasins, Sarah, et al. The sky above the clouds. arXiv preprint arXiv:2205.07147, 2022 Durner, Dominik, Viktor Leis, and Thomas Neumann. Exploiting Cloud Object Storage for High-Performance Analytics. VLDB, 2023 Jain, Paras, et al. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays. NSDI, 2023 Yang, Zongheng, et al. SkyPilot: An Intercloud Broker for Sky Computing. NSDI, 2023 What are public, private, and hybrid clouds?. Microsoft, 2023 Flexible, resilient, secure IT for your hybrid cloud. IBM, 2023 Public Cloud vs Private Cloud vs Hybrid Cloud. MongoDB	L19 (L19-1, L19-2, L19-3)
20	Mon 11/13	Project Meetings	Meeting with the instructor to discuss the course project.
21	Wed 11/15	Auto-tuning	Van Aken, Dana, et al. Automatic database management system tuning through large-scale machine learning. SIGMOD, 2017 Pavlo, Andrew, et al. Self-Driving Database Management Systems. CIDR, 2017 Kanellis, Konstantinos, et al. LlamaTune: Sample-Efficient DBMS Configuration Tuning. VLDB, 2022	L21 (L21-1, L21-2, L21-3)
22	Mon 11/20	HTAP	Prout, Adam, et al. Cloud-Native Transactions and Analytics in SingleStore. SIGMOD, 2022 Yang, Jiacheng, et al. F1 Lightning: HTAP as a Service. VLDB, 2020 Chen, Jianjun, et al. ByteHTAP: bytedance's HTAP system with high data freshness and strong data consistency. VLDB 2022 Huang, Dongxu, et al. TiDB: a Raft-based HTAP database. VLDB, 2020 HTAP: HYBRID TRANSACTIONAL AND ANALYTICAL PROCESSING. Snowflake, 2023	L22 (L22-1, L22-2, L22-3, L22-4)
		New Hardware
23	Wed 11/22	GPU database	Anil Shanbhag, et al., A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. SIGMOD, 2020 Anil Shanbhag, et al. Tile-based Lightweight Integer Compression in GPU. SIGMOD, 2022 Bobbi Yogatama, et al. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. VLDB 2022 Cao, Jiashen, et al. Revisiting Query Performance in GPU Database Systems. arXiv 2023	L23 (L23-1, L23-2, L23-3, L23-4)
24	Mon 11/27	Memory Disaggregation	Li, Huaicheng, et al. Pond: CXL-based memory pooling systems for cloud platforms. ASPLOS, 2023 Zhang, Qizhen, et al. Redy: remote dynamic memory cache. VLDB, 2021 Wang, Ruihong, et al. The case for distributed shared-memory databases with RDMA-enabled memory disaggregation. VLDB, 2023 Zhang, Qizhen, et al. Compucache: Remote computable caching using spot vms. CIDR, 2022 Lim, Kevin, et al. Disaggregated memory for expansion and sharing in blade servers. ISCA, 2009	L24 (L24-1, L24-2, L24-3)
25	Wed 11/29	RDMA	Binnig, Carsten, et al. The end of slow networks: It's time for a redesign. VLDB, 2016 Zamanian, Erfan, et al. The end of a myth: Distributed transactions can scal. VLDB, 2017 Barthels, Claude, et al. Rack-scale in-memory join processing using RDMA. SIGMOD, 2015 Rodiger, Wolf, et al. High-speed query processing over high-speed networks. VLDB, 2015	L25 (L25-1, L25-2, L25-3)
26	Mon 12/4	SmartNIC	Lin, Jiaxin, et al. Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs. SIGMETRICS, 2023 Liu, Ming, et al. Offloading distributed applications onto smartnics using ipipe. SIGCOMM, 2019 Schuh, Henry N., et al. Xenic: SmartNIC-accelerated distributed transactions. SOSP, 2021	L26 (L26-1, L26-2, L26-3)
27	Wed 12/6	No Lecture
28	Mon 12/11	Project Presentation
29	Wed 12/13	Project Presentation