I am an Assistant Professor in the Computer Sciences Department at University of Wisconsin-Madison.
Before joining UW-Madison, I was a Postdoctoral Associate in the
database group at
CSAIL,
MIT working with Prof.
Michael Stonebraker and Prof.
Samuel Madden. I completed my Ph.D. in Computer Science at MIT in 2017 working with Prof.
Srinivas Devadas. I earned my Bachelor of Science (B.S.) in 2012 from
Institute of Microelectronics at
Tsinghua University, Beijing, China.
I work on database systems and currently focus on (1) cloud-native databases, (2) new hardware for databases, and (3) core DB techniques in both transaction and analytics.
I am actively looking for Postdocs and Graduate/Undergraduate students interested in database systems. Please email me your CV if you are interested in working with me.
My research actively focuses in three areas: (I)
Cloud-native databases, (II)
New hardware for databases, and (III)
Core DB techniques. Below are some sample projects.
Research Area I: Cloud-Native Databases
Databases are moving to the cloud driven by desirable properties such as elasticity, high-availability, and cost competitiveness. Modern cloud-native databases adopt a unique storage-disaggregation architecture, where the computation and storage are decoupled. This architecture brings new challenges (e.g., network bandwidth bottleneck) and opportunities in DBMS design.
Cloud-native data warehouse:
-
Cloud-DW [VLDB'19]: Evaluation of several popular cloud-native data warehouse systems that have different architectures.
-
PushdownDB [code][ICDE'20]: A cloud-native OLAP system that leverages AWS S3 Select to push down selection, projection, and aggregation to speedup query processing.
-
FlexPushdownDB [code][VLDB'21]: A cloud-native OLAP DBMS that combines caching and pushdown at a fine-granularity in a storage disaggregation architecture.
-
FlexPushdownDB journal [code][VLDBJ'24]: Extending FPDB to support advanced pushdown operators (e.g., Bloom filter, selection bitmap, and shuffle) and adaptive pushdown, which pushes tasks back to compute servers when storage layer computation is limited.
Cloud-native transaction processing:
-
Litmus [code][SIGMOD'22]: A DBMS that provides verifiable proofs of atomicity and serializability for transactions, through the codesign of database and cryptographic tools.
-
Cornus [code][VLDB'22]: An optimized two-phase commit protocol in a cloud-native database. Cornus reduces 2PC latency and eliminates blocking by leveraging the unique architectural features of storage disaggregation.
-
Epoxy [code][VLDB'23]: Epoxy provides ACID transactions across heterogeneous data stores (e.g., MongoDB, ElasticSearch, GCS, MySQL) to simplify cloud application development.
-
R^3 [code][VLDB'23]: R3 is a Record-Replay-Retroaction tool that simplifies debugging database-backed applications. R3 can replay an application in the same order as the original execution; it also enables retroaction, allowing the replay to run modified code instead of the original code.
Research Area II: New Hardware for Databases
GPU database:
GPU is a promising solution for data analytics, driven by the rapid growth of GPU computation power, GPU memory capacity and bandwidth, and PCIe bandwidth. We investigate techniques that can fully unleash the power of GPU in online analytical processing (OLAP) databases.
-
Crystal [code][SIGMOD'20]: A library that can run full SQL queries in GPU and saturate GPU memory bandwidth.
-
GPU-compression [code][SIGMOD'22]: A highly optimized GPU compression scheme that achieves high compression ratio and fast decompression speed.
-
Mordred [code][VLDB'22]: A heterogeneous CPU-GPU query execution engine that optimizes data placement (i.e., semantic-aware caching) and query execution (i.e., segment-level query execution).
-
GPU-UDAF [DaMoN@SIGMOD'23]: This work optimizes user-defined aggregate function (UDAF) in cuDF through block-wide execution model and just-in-time (JIT) compilation, achieving 3600x speedup. The work has been fully integrated and released in NVIDIA RAPIDS cuDF version 23.02.
Advanced network technologies:
Network is a bottleneck in distributed databases. Emerging network technologies including RDMA, SmartNIC, and programmable switches support different levels of computation within the network and are promising in accelerating distributed databases.
-
Active-memory [VLDB'19]: Active-memory replication is a new high-availability scheme that leverages RDMA to directly update replica's memory and eliminate the computation overhead of log replay.
-
SmartShuffle [SIGMETRICS'23]: Accelerate data analytics by offloading various computation tasks into the SmartNIC device. SmartShuffle outperforms Spark RDMA by up to 40% on TPC-H.
Research Area III: Core DB Techniques
Scalable transaction processing on multicore CPUs:
Computer architectures are moving towards manycore machines with dozens or even hundreds of cores on a single chip. We develop new techniques for modern database management systems (DBMSs) to make transaction processing scalable for this level of massive parallelism.
-
DBx1000 [code][VLDB'14]: Scalability analysis of seven classic concurrency control protocols on a simulated 1000-core CPU.
-
TicToc [code][SIGMOD'16]: A scalable timestamp-based concurrency control protocol that resolves the timestamp allocation bottleneck through data-driven timestamp management.
-
Taurus [code][VLDB'20]: A lightweight parallel logging scheme that avoids the central logging bottleneck by writing to multiple log streams.
-
Bamboo [code][SIGMOD'21]: An optimized two-phase locking (2PL) protocol that mitigates hotspot overhead by releasing locks early during transaction execution.
-
Plor [code][SIGMOD'22]: A technique called pessimistic locking and optimistic reading (Plor) to reduce tail latency for high-contention transactional workloads, while maintaining high throughput.
-
Blink-Hash [code][VLDB'23]: A new index design that enhances a tree-based index with hash leaf nodes to mitigate the contention of monotonic insertions, a pattern common in time-series workloads.
-
DeToX [code][OSDI'23]: A caching algorithm that leverages transactional dependencies to make eviction and prefetching decisions more intelligently. DeToX increases transactional hit rate by 1.3x compared to single-object caching policy.
-
Polaris [code][SIGMOD'23]: Polaris enables priority among transactions for state-of-the-art OCC protocol, Silo, and achieves 17x lower tail latency for high-contention workloads.
-
Two-Tree [code][CIDR'23]: Two-Tree splits a single index structure (e.g., B-tree) into a top in-memory tree for hot records, and a bottom tree for cold pages, and achieves 1.7x higher throughput than conventional One-Tree design.
-
Three-Tree [code][SIGMOD'24]: Exploration of OLTP buffer management strategies with two-tier main memory; no existing design can win in all measured dimensions.
Scalable distributed transaction processing:
Online transaction processing (OLTP) DBMSs are increasingly deployed on distributed machines. Compared to a centralized systems, distributed DBMSs face new challenges including extra network latency, requirements of high availability and distributed commitment.
-
Sundial [code][VLDB'18]: A distributed concurrency control protocol that is algorithmically similar to TicToc; Sundial integrates cache coherence and concurrency control into a unified protocol.
-
STAR [code][VLDB'19]: A distributed DBMS where data replicas use asymmetric architectures (e.g., non-partitioned and partition-based). A transaction is executed in the replica that delivers better performance.
-
Aria [code][VLDB'20]: A deterministic distributed DBMS that no longer requires knowing transactions' read/write sets before execution. Aria also achieves higher throughput than previous deterministic DBMSs.
-
Coco [code][VLDB'21]: A distributed OLTP DBMS that mitigates the synchronization overhead of distributed commitment and data replication by committing transactions in epochs.
-
Lotus[code][VLDB'22]: Optimize multi-partition transactions in a distributed and partitioned database.
Hybrid transactional/analytical processing (HTAP): HTAP systems have gained popularity as they combine OLAP and OLTP processing to reduce administrative and synchronization costs between dedicated systems. This brings new challenges in data freshness and performance isolation between transactional and analytical processing.
-
HATtrick [code][SIGMOD'22]: A benchmark for HTAP systems that uses two new performance metrics: throughput frontier and freshness score. Three representative systems are evaluated.
Predicate Transfer: Predicate transfer is a method that optimizes multi-join queries by pre-filtering tables to reduce the join input size. Predicate transfer is inspired by the seminal theoretical results by Yannakakis but leverage Bloom Filters to become more practical.
-
Predicate Transfer [code][CIDR'24]: The original predicate transfer paper.