CS839 Design the Next-Generation Database

Lectures: Tue/Thu 1:00pm - 2:15pm @ 301 Educational Sciences
Instructor: Xiangyao Yu
Office Hours: Tue 2:30pm - 3:30pm @ CS 4385
Class Mailing List: compsci839-5-s20@lists.wisc.edu

Course description

Database systems are undergoing major changes due to new hardware, system architecture, and applications. It is an exciting time to revisit traditional wisdoms of database design with new system-level challenges and opportunities. In this seminar class, we will cover recent developments of databases through a mix of lectures and paper discussions. We will cover both hardware and software aspects. Topics include:

Lecture Format: Each lecture focuses on one state-of-the-art research paper. Students will read the paper and submit a review to https://wisc-cs839-ngdb20.hotcrp.com before the lecture starts. You are allowed to skip 3 reviews in total. In each lecture, the instructor presents the paper, followed by students discussing the paper in groups and sharing the conclusions with the entire class. Each group submits a discussion summary (in any format) to HotCRP before 11:59pm the day after the lecture.

Course projects: The course features an open-ended, research-oriented project. Students work in groups of size 2-4. It is encouraged to pick topics related to your existing research. A list of potential projects will also be provided. The project includes a proposal, a final report, and an in-class presentation. It is encouraged to continue working on a successful project afterwards and submit a research paper to a top-tier database conference (e.g., SIGMOD or VLDB). This is a great opportunity to explore PhD research topics in both database systems and computer architecture.

Project deadlines:

Computation resources:

Instructions for online lectures: Starting from March 12, all lectures are moved online to canvas.wisc.edu -> Courses -> SP20 COMPSCI 839 005 -> BBCollaborate Ultra. A session can be joined 15 min before the lecture and ends 15 min after the lecture. Office hours will be held online using the same software; separate sessions will be created for office hours.

Final presentation format: Course project presentations happen on April 28 and 30. Each team has a 10 min slot (8 min presentation + 2 min Q&A). Please sign up following the link that was sent to the class mailing list.


Prerequisites

The class is mostly self-contained. Background knowledge in databases (CS 564) and computer architecture (CS/ECE 552) is desired. Advanced knowledge in databases (CS 764) and computer architecture (CS 752/757) is not required.


Grading


Schedule

Lecture Date Topic Reading Slides
1 Tue 1/21 Introduction None L1
2 Thu 1/23 Transaction basics [optional] What's Really New with NewSQL?
L2
3 Tue 1/28 Analytics basics [optional] C-Store: A Column-oriented DBMS L3
Massive parallelism
4 Thu 1/30 Multicore 1 Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores
[optional] Concurrency Control Performance Modeling: Alternatives and Implications
[optional] OLTP Through the Looking Glass, and What We Found There
L4
5 Tue 2/4 Multicore 2 Speedy Transactions in Multicore In-Memory Databases
[optional] TicToc: Time Traveling Optimistic Concurrency Control
[optional] Hekaton: SQL Server's Memory-Optimized OLTP Engine
L5
6 Thu 2/6 Deterministic database Calvin: Fast Distributed Transactions for Partitioned Database Systems
[optional] Rethinking serializable multiversion concurrency control (Extended Version)
[optional] An Evaluation of Distributed Concurrency Control
L6
7 Tue 2/11 GPU database A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics
[optional] An Overview of MapD (Massively Parallel Database)
L7
8 Thu 2/13 Accelerator Q100: The Architecture and Design of a Database Processing Unit
[optional] A Many-core Architecture for In-Memory Data Processing
L8
9 Tue 2/18 Guest lecture by
Dr. Goetz Graefe
[recommended] New algorithms for join and grouping operations
[optional] Implementing Sorting in Database Systems
[optional] Modern B-tree techniques
Emerging memory/storage
10 Thu 2/20 NVM 1 Managing Non-Volatile Memory in Database Systems
[optional] Basic Performance Measurements of the Intel Optane DC Persistent Memory Module
L10
11 Tue 2/25 NVM 2 Write-Behind Logging
[optional] Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems
L11
12 Thu 2/27 HBM Joins in a Heterogeneous Memory Hierarchy: Exploiting High-Bandwidth Memory
[optional] Fundamental Latency Trade-offs in Architecting DRAM Caches
L12
13 Tue 3/3 Smart SSD Query Processing on Smart SSDs: Opportunities and Challenges
[optional] Enabling Cost-effective Data Processing with Smart SSD
L13
14 Thu 3/5 PIM Database Processing-in-Memory: An Experimental Study
[optional] Near-Data Processing: Insights from A MICRO-46 Workshop
L14
New network technology
15 Tue 3/10 RDMA for DB The End of Slow Networks: It's Time for a Redesign
[optional] The End of a Myth: Distributed Transactions Can Scale
L15
16 Thu 3/12 High availability Rethinking Database High Availability with RDMA Networks
[optional] Query Fresh: Log Shipping on Steroids
L16
17 Tue 3/24 Smart NIC Offloading Distributed Applications onto SmartNICs using iPipe
[optional] The Case for Network-Accelerated Query Processing
L17
18 Thu 3/26 Guest lecture by
Dr. Michael Marty
[optional] Snap: a Microkernel Approach to Host Networking
19 Tue 3/31 RDMA for OLAP Distributed Join Algorithms on Thousands of Cores
[optional] Rack-Scale In-Memory Join Processing using RDMA
[optional] High-Speed Query Processing over High-Speed Networks
L19
Cloud architecture
20 Thu 4/2 OLTP in cloud Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
[optional] Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
L20
21 Tue 4/7 Cloud data warehousing Choosing A Cloud DBMS: Architectures and Tradeoffs
[optional] Amazon Redshift and the Case for Simpler Data Warehouses
L21
22 Thu 4/9 Snowflake The Snowflake Elastic Data Warehouse
[optional] Eon Mode: Bringing the Vertica Columnar Database to the Cloud
L22
23 Tue 4/14 Serverless Starling: A Scalable Query Engine on Cloud Function Services
[optional] Cloud Programming Simplified: A Berkeley View on Serverless Computing
L23
Emerging applications
24 Thu 4/16 HTAP HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots
[optional] Hybrid Transactional/Analytical Processing: A Survey
L24
25 Tue 4/21 Time series Gorilla: A Fast, Scalable, In-Memory Time Series Database
[optional] Time Series Management Systems: A Survey
L25
26 Thu 4/23 Guest Lecture by
Dr. Shasank Chavan
Course project
27 Tue 4/28 Project presentation
28 Thu 4/30 Project presentation