Condensed abstract: Many indexing techniques have been
proposed to aid even the
simplest binary joins.
Traditional indexes on
both
join input tables by the join key offers efficient maintenance and reasonable merge-join performance.
Traditional materialized join view indexed by the join key offers the best query performance but incurs high
maintenance overhead and requires large space.
This work shows that merged index is a hybrid of both in its index design and enjoys the best of both
worlds.
Our comparison is done
across 189 experiments in a wide range of configurations using both
b-trees and LSM-forests. They show that the merged index sustains update rates 100% as high as the
traditional
single-table indexes and query rates 100%–250% as high as the materialized join view.
Storing and Indexing Multiple Tables by an Interesting Order
For efficient complex joins, grouping, and updates in relational databases
Wenhui Lyu and Goetz Graefe
Feb 2025–Present
Repo of
experiments
C++ 17, Linux, Google Cloud, LLDB/GDB
Query Execution, Indexing and Physical Database Design, KV Store
(LeanStore and
RocksDB)
Education
2022–2027/2028 (Expected)
Computer Science, University of Wisconsin–Madison
2025–Present, PhD2022–2024, M.S. Computer Science, GPA: 3.95/4.0
Advisors: Dr. Goetz Graefe, Prof. AnHai DoanField: Database Systems
Coursework: CS764 Topics in DBMS (Top of Class), CS736 Advanced OS, CS744 Big Data Systems
B.A., Philosophy, Politics, and Economics, Peking University
Thesis: Kant's Theory of SpaceAdvisor: Prof. Zengding WuGPA: 3.81/4.0 (Top 10%)
Selected Awards: National Scholarship (Top 0.2% nationwide), Dean's List (2019, 2020, 2021)
Course Projects in ML Systems and DB Systems
An SPJ Query Processor Prototype with Efficient External Merge-Sort
Leading contributor, group of 2Mar–Apr 2024
Implemented an external merge-sort algorithm with graceful spilling that sorts 120 GB data around 40
minutes using buffer
pool of only 100
MB. Repo C++, MacOS
Towards System Reproducibility in Multi-GPU LLM Inference
Equal contributor, group of 2Sep–Dec 2024
Developed and open-sourced a profiling toolchain to
analyze multi-GPU LLM inference at layer-level granularity, reducing search space for variability by
95%.
Repo PyTorch, Linux
Async Scheduling for Media Processing in Neural Network Inference
Leading contributor, group of 4Mar–May 2024
Minimized context-switch overhead and optimized media processing time by over 30%.
Skills: Python multi-processing, NVIDIA Triton GPU scheduler, OS
Design Repo