I am an Assistant Professor in the Computer Sciences Department at University of Wisconsin-Madison.
Before joining UW-Madison, I was a Postdoctoral Associate in the database group
working with Prof. Michael Stonebraker
and Prof. Samuel Madden
. I completed my Ph.D. in Computer Science at MIT in 2017 working with Prof. Srinivas Devadas
. I earned my Bachelor of Science (B.S.) in 2012 from Institute of Microelectronics
at Tsinghua University
, Beijing, China.
I work on database systems and currently focus on (1) transactions and HTAP, (2) new hardware for databases, and (3) cloud-native databases.
I am actively looking for Postdocs and Graduate/Undergraduate students interested in database systems. Please email me your CV if you are interested in working with me.
My research actively focuses in three areas: (I) Transactions and HTAP
, (II) New hardware for databases
, and (III) Cloud-native databases
. Below are some sample projects.
Research Area I: Transactions and HTAP
Scalable transaction processing on multicore CPUs
Computer architectures are moving towards manycore machines with dozens or even hundreds of cores on a single chip. We develop new techniques for modern database management systems (DBMSs) to make transaction processing scalable for this level of massive parallelism.
Scalable distributed transaction processing
DBx1000 [code][VLDB'14]: Scalability analysis of seven classic concurrency control protocols on a simulated 1000-core CPU.
TicToc [code][SIGMOD'16]: A scalable timestamp-based concurrency control protocol that resolves the timestamp allocation bottleneck through data-driven timestamp management.
Taurus [code][VLDB'20]: A lightweight parallel logging scheme that avoids the central logging bottleneck by writing to multiple log streams.
Bamboo [code][SIGMOD'21]: An optimized two-phase locking (2PL) protocol that mitigates hotspot overhead by releasing locks early during transaction execution.
Plor [code][SIGMOD'22]: A technique called pessimistic locking and optimistic reading (Plor) to reduce tail latency for high-contention transactional workloads, while maintaining high throughput.
Online transaction processing (OLTP) DBMSs are increasingly deployed on distributed machines. Compared to a centralized systems, distributed DBMSs face new challenges including extra network latency, requirements of high availability and distributed commitment.
Hybrid transactional/analytical processing (HTAP)
Sundial [code][VLDB'18]: A distributed concurrency control protocol that is algorithmically similar to TicToc; Sundial integrates cache coherence and concurrency control into a unified protocol.
STAR [code][VLDB'19]: A distributed DBMS where data replicas use asymmetric architectures (e.g., non-partitioned and partition-based). A transaction is executed in the replica that delivers better performance.
Aria [code][VLDB'20]: A deterministic distributed DBMS that no longer requires knowing transactions' read/write sets before execution. Aria also achieves higher throughput than previous deterministic DBMSs.
Coco [code][VLDB'21]: A distributed OLTP DBMS that mitigates the synchronization overhead of distributed commitment and data replication by committing transactions in epochs.
: HTAP systems have gained popularity as they combine OLAP and OLTP processing to reduce administrative and synchronization costs between dedicated systems. This brings new challenges in data freshness and performance isolation between transactional and analytical processing.
HATtrick [code][SIGMOD'22]: A benchmark for HTAP systems that uses two new performance metrics: throughput frontier and freshness score. Three representative systems are evaluated.
Research Area II: New Hardware for Databases
GPU is a promising solution for data analytics, driven by the rapid growth of GPU computation power, GPU memory capacity and bandwidth, and PCIe bandwidth. We investigate techniques that can fully unleash the power of GPU in online analytical processing (OLAP) databases.
Advanced network technologies
Crystal [code][SIGMOD'20]: A library that can run full SQL queries in GPU and saturate GPU memory bandwidth.
GPU-compression [code][SIGMOD'22]: A highly optimized GPU compression scheme that achieves high compression ratio and fast decompression speed.
Network is a bottleneck in distributed databases. Emerging network technologies including RDMA, SmartNIC, and programmable switches support different levels of computation within the network and are promising in accelerating distributed databases.
Active-memory [VLDB'19]: Active-memory replication is a new high-availability scheme that leverages RDMA to directly update replica's memory and eliminate the computation overhead of log replay.
Research Area III: Cloud-Native Databases
Cloud-native data warehouse
Databases are moving to the cloud driven by desirable properties such as elasticity, high-availability, and cost competitiveness. Modern cloud-native databases adopt a unique storage-disaggregation architecture, where the computation and storage are decoupled. This architecture brings new challenges (e.g., network bandwidth bottleneck) and opportunities in DBMS design.
Cloud-DW [VLDB'19]: Evaluation of several popular cloud-native data warehouse systems that have different architectures.
PushdownDB [code][ICDE'20]: A cloud-native OLAP system that leverages AWS S3 Select to push down selection, projection, and aggregation to speedup query processing.
FlexPushdownDB [code][VLDB'21]: A cloud-native OLAP DBMS that combines caching and pushdown at a fine-granularity in a storage disaggregation architecture.
Litmus [code][SIGMOD'22]: A DBMS that provides verifiable proofs of atomicity and serializability for transactions, through the codesign of database and cryptographic tools.