Modern applications are moving to the cloud for global accessibility, elasticity, high availability, and low cost. Databases are one of the foundational technologies for cloud applications. Compared to traditional on-premises databases, cloud-native databases have unique architectures (e.g., storage-disaggregation), embrace heterogeneous hardware technologies (e.g., GPU, CXL, SmartNIC), and face new application scenarios (e.g., serverless, autoscaling). This seminar course covers recent development in cloud-native databases from both industrial deployment and academic research. Each lecture features presentations from the instructor and students, and group discussions. The course has a final group project.
: Reviews must be submitted before the lecture starts in order to be graded. You can skip up to 2 reviews without losing points; otherwise 1% of total grade is deducted for each missing review. Please discuss with the instructor if you cannot submit project proposal or report before the deadline.
Lec# |
Date |
Topic |
Reading |
Slides |
1 |
Wed 9/6 |
Introduction |
None |
L1 |
| |
Storage Disaggregation |
|
2 |
Mon 9/11 |
Aurora |
Alexandre Verbitski, et al., Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. SIGMOD, 2017
|
L2 |
3 |
Wed 9/13 |
Snowflake |
Benoit Dageville, et al., The Snowflake Elastic Data Warehouse. SIGMOD, 2016
|
L3 |
4 |
Mon 9/18 |
Analytical Processing-1 |
Yifei Yang, et al., FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB, 2021
Xiangyao Yu, et al., PushdownDB: Accelerating a DBMS using S3 Computation. ICDE, 2020
Cai, Mengchu, et al. Integrated Querying of SQL database data and S3 data in Amazon Redshift. IEEE Data Eng. Bull. 2018
|
L4 (L4-1, L4-2) |
5 |
Wed 9/20 |
Analytical Processing-2 |
Vuppalapati, Midhul, et al. Building an elastic query engine on disaggregated storage. NSDI, 2020
Melnik, Sergey, et al. Dremel: interactive analysis of web-scale datasets. VLDB, 2010
Melnik, Sergey, et al. Dremel: A decade of interactive SQL analysis at web scale VLDB, 2020
Armbrust, Michael, et al. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. CIDR, 2021
|
L5 (L5-1, L5-2, L5-3) |
6-8 |
Mon 9/25 to Mon 10/2 |
Transactional Processing |
Panagiotis Antonopoulos, et al., Socrates: The New SQL Server in the Cloud. SIGMOD, 2019
Zhou, Jingyu, et al. Foundationdb: A distributed unbundled transactional key value store. SIGMOD, 2021
Corbett, James C., et al. Spanner: Google's globally distributed database. OSDI, 2012
Guo, Zhihan, et al. Cornus: atomic commit for a cloud DBMS with storage disaggregation. VLDB 2022
Peng, Daniel, and Frank Dabek. Large-scale incremental processing using distributed transactions and notifications. OSDI, 2010
Taft, Rebecca, et al. Cockroachdb: The resilient geo-distributed sql database. SIGMOD, 2020
Yang, Zhenkun, et al. OceanBase: a 707 million tpmC distributed relational database system. VLDB, 2022
Cao, Wei, et al. PolarDB-X: An Elastic Distributed Relational Database for Cloud-Native Applications. ICDE, 2022
Lomet, David, et al. Unbundling transaction services in the cloud. CIDR 2009
|
|
| |
Serverless |
|
10 |
Mon 10/9 |
Project Meetings |
Meeting with the instructor to discuss the final project. |
|
11-14 |
Wed 10/11 to Mon 10/23 |
Serverless and Function as a Service |
Gaffney, Kevin P., et al. Sqlite: past, present, and future. VLDB, 2022
Raasveldt, Mark, and Hannes Muhleisen. Duckdb: an embeddable analytical database. SIGMOD, 2019
Perron, Matthew, et al. Starling: A scalable query engine on cloud functions. SIGMOD, 2020
Muller, Ingo, Renato MarroquĂn, and Gustavo Alonso. Lambada: Interactive data analytics on cold data using serverless cloud infrastructure. SIGMOD, 2020
Sreekanti, Vikram, et al. Cloudburst: Stateful functions-as-a-service. VLDB, 2020
Hellerstein, Joseph M., et al. Serverless computing: One step forward, two steps back. CIDR, 2019
Jonas, Eric, et al. Cloud programming simplified: A berkeley view on serverless computing. Technical Report No. UCB/EECS-2019-3, 2019
Johann Schleier-Smith. Understanding and Exploring Serverless Cloud Computing (Sections 2.1-2.6). Technical Report No. UCB/EECS-2022-273, 2022
Cao, Wei, et al. Polardb serverless: A cloud native database for disaggregated data centers. SIGMOD, 2021
Arun Ulagaratchagan. Introducing Microsoft Fabric: Data analytics for the era of AI. Blog post, 2023
|
|
15 |
Wed 10/25 |
DBOS |
Skiadopoulos, Athinagoras, et al. DBOS: a DBMS-oriented Operating System. VLDB, 2022
Kraft, Peter, et al. Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework. arXiv preprint arXiv:2208.13068, 2022
|
|
16 |
Mon 10/30 |
Auto-scaling |
Zhu, Yiwen, et al. Towards Building Autonomous Data Services on Azure. SIGMOD, 2023
Wu, Chenggang, Vikram Sreekanti, and Joseph M. Hellerstein. Autoscaling tiered cloud storage in Anna. VLDB, 2019
|
|
17 |
Wed 11/1 |
Multi-cloud |
Chasins, Sarah, et al. The sky above the clouds. arXiv preprint arXiv:2205.07147, 2022
What are public, private, and hybrid clouds?. Microsoft, 2023
Flexible, resilient, secure IT for your hybrid cloud. IBM, 2023
Public Cloud vs Private Cloud vs Hybrid Cloud. MongoDB
|
|
18 |
Mon 11/6 |
Auto-tuning |
Van Aken, Dana, et al. Automatic database management system tuning through large-scale machine learning. SIGMOD, 2017
Pavlo, Andrew, et al. Self-Driving Database Management Systems. CIDR, 2017
|
|
19 |
Wed 11/8 |
Project Meetings |
Meeting with the instructor to discuss the final project.
|
|
20 |
Mon 11/13 |
HTAP |
Prout, Adam, et al. Cloud-Native Transactions and Analytics in SingleStore. SIGMOD, 2022
Yang, Jiacheng, et al. F1 Lightning: HTAP as a Service. VLDB, 2020
Chen, Jianjun, et al. ByteHTAP: bytedance's HTAP system with high data freshness and strong data consistency. VLDB 2022
HTAP: HYBRID TRANSACTIONAL AND ANALYTICAL PROCESSING. Snowflake, 2023
|
|
| |
New Hardware |
|
21 |
Wed 11/15 |
GPU database |
Anil Shanbhag, et al., A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. SIGMOD, 2020
Anil Shanbhag, et al. Tile-based Lightweight Integer Compression in GPU. SIGMOD, 2022
Bobbi Yogatama, et al. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. VLDB 2022
|
|
22 |
Mon 11/20 |
Memory Disaggregation |
Li, Huaicheng, et al. Pond: CXL-based memory pooling systems for cloud platforms. ASPLOS, 2023
Zhang, Qizhen, et al. Redy: remote dynamic memory cache. VLDB, 2021
|
|
23 |
Wed 11/22 |
RDMA |
TBD
|
|
24 |
Mon 11/27 |
SmartNIC |
TBD
|
|
25-27 |
Wed 11/29 to Wed 12/6 |
TBD |
|
|
28 |
Mon 12/11 |
Project Presentation |
|
|
29 |
Wed 12/13 |
Project Presentation |
|
|