Xiangyao Yu

Xiangyao Yu

Assistant Professor
Database Group
Computer Sciences Department, University of Wisconsin-Madison
yxy AT cs.wisc.edu
Room 4385, 1210 W. Dayton St, Madison WI 53706
[C.V.] [Google Scholar]



I am an assistant professor in the Computer Sciences Department at University of Wisconsin-Madison.

Before joining UW-Madison, I was a visiting researcher at Google Madison in 2019. Before that, I was a Postdoctoral Associate in the database group at CSAIL, MIT working with Prof. Michael Stonebraker and Prof. Samuel Madden. I completed my Ph.D. in Computer Science at MIT working with Prof. Srinivas Devadas. I earned my Bachelor of Science (B.S.) from Institute of Microelectronics at Tsinghua University, Beijing, China.

My research interests include Database Systems and Computer Architecture, with a focus on transaction processing, software-hardware codesign, and in-cloud databases.

I am actively looking for PhD, master, and undergraduate students interested in databases or computer architecture. Please email me your CV if you are interested in working with me.




My research activities focus in three areas: (I) transaction processing, (II) software-hardware codesign, and (III) in-cloud databases. Sample projects appear below.

Research Area I: Transaction Processing

1000-Core Database

Computer architectures are moving towards many-core machines with dozens or even hundreds of cores on a single chip, which the current database management systems (DBMSs) are not designed for. We performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips. Our Analysis (VLDB'14) shows that all algorithms fail to scale to this level of parallelism. For each algorithm, we identified artificial and fundamental bottlenecks. We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware. Our DBMS is open source on github (DBx1000).
[Related Publication: VLDB'14]


Scalable Concurrency Control

TicToc is an optimistic concurrency control (OCC) protocol that resolves the timestamp allocation bottleneck in traditional protocols. TicToc uses "data-driven timestamp management" that is novel and provably correct. Instead of statically assigning timestamps to transactions, TicToc assigns read and write timestamps to each data item to dynamically compute a transaction's timestamp. TicToc removes the need for centralized timestamp allocation and commits transactions that would be aborted by conventional protocols. TicToc scales on multicore processors and outperforms state-of-the-art concurrency control protocols. Sundial is a distributed version of TicToc that combines concurrency control and caching in a unified protocol.
[Related Publication: SIGMOD'16, VLDB'18]


Research Area II: Software-Hardware Codesign

IMP: Indirect Memory Prefetcher

Important applications like machine learning, graph analytics, and sparse linear algebra are dominated by irregular memory accesses which have little temporal or spatial locality and are difficult to prefetcher using traditional techniques. A majority of these irregular accesses come from indirect patterns of the form A[B[i]]. We propose an efficient hardware indirect memory prefetcher (IMP) to hide memory latency of this access pattern. We also propose a partial cacheline accessing mechanism to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on seven applications, IMP showed 56% speedup on average (up to 2.3x) compared to a baseline streaming prefetchers on a 64 core system.
[Related Publication: MICRO'15]




Tardis Cache Coherence Protocol

Tardis is a timestamp-based cache coherence protocol that scales to 1000 cores while maintaining simplicity and good performance. Tardis is different from conventional coherence protocols that typically use the invalidation mechanism for a write to propagate to shared cached copies. Instead, each read copy acquires a lease such that a write operation happens only after the lease expires. Tardis uses logical instead of physical leases such that a write is not blocked waiting for a lease. Instead, a write happens immediately by "jumping ahead" in logical time by changing the timestamps. We have proven the correctness of Tardis and extended it to relaxed consistency models.
[Related Publication: PACT'15, arXiv'15, PACT'16]




Research Area III: In-Cloud Databases


Near Cloud Storage Computing

Modern cloud platforms disaggregate computation and storage into separate services. In this project, we explored the idea of using limited computation inside the simple storage service (S3) offered by AWS to accelerate data analytics. We use the existing S3 Select feature to accelerate not only simple database operators like select and project, but also complex operators like join, group-by, and top-K. We propose optimization techniques for each individual operator and demonstrated more than 6x performance improvement over a set of representative queries.











Peer-Reviewed Publications

Patents

Thesis

Here is a list of my teaching experiences:


PC member for

Reviewer for