Storing and Indexing Multiple Tables by an Interesting Order
Wenhui Lyu and Goetz Graefe, Feb 2025–Mar 2026
For efficient complex joins, grouping, and updates in relational databases
Submitted to VLDB 2026
Condensed abstract:
Resolved the fundamental read/write trade-off in relational databases by generalizing "merged indexes" for multi-table joins and grouping operations. By embedding "interesting orderings" into the physical database design, this architecture partially pre-computes queries to maintain the update efficiency of traditional single-table indexes. This approach matches or outperforms pre-computed materialized views in query speed by up to 2x, eliminating their severe storage and update overhead.
Indexing Join Inputs for Fast Queries and Maintenance
May–Oct, 2024
Repo of
experiments
C++ 17, Linux, Google Cloud, LLDB/GDB
Query Execution, Indexing and Physical Database Design, KV Store
(LeanStore and
RocksDB)
Education
2022–2027/2028 (Expected)
Computer Science, University of Wisconsin–Madison
2025–Present, PhD
2022–2024, M.S. Computer Science, GPA: 3.95/4.0
Advisors: Dr. Goetz Graefe, Prof. AnHai Doan
Field: Database Systems
Coursework: CS764 Topics in DBMS (Top of Class), CS736 Advanced OS, CS744 Big Data Systems
Activities: Graduate Advising Board, Graduate Orientation Speaker
2018–2022
B.A., Philosophy, Politics, and Economics, Peking University
Thesis: Kant's Theory of Space
Advisor: Prof. Zengding Wu
GPA: 3.81/4.0 (Top 10%)
Selected Awards: National Scholarship (Top 0.2% nationwide), Dean's List (2019, 2020, 2021)
Course Projects in ML Systems and DB Systems
An SPJ Query Processor Prototype with Efficient External Merge-Sort
Leading contributor, group of 2
Mar–Apr 2024
Implemented an external merge-sort algorithm with graceful spilling that sorts 120 GB data around 40
minutes using buffer
pool of only 100
MB.
Repo
C++, MacOS
Towards System Reproducibility in Multi-GPU LLM Inference
Equal contributor, group of 2
Sep–Dec 2024
Developed and open-sourced a profiling toolchain to
analyze multi-GPU LLM inference at layer-level granularity, reducing search space for variability by
95%.
Repo
PyTorch, Linux
Async Scheduling for Media Processing in Neural Network Inference
Leading contributor, group of 4
Mar–May 2024
Minimized context-switch overhead and optimized media processing time by over 30%.
Skills: Python multi-processing, NVIDIA Triton GPU scheduler, OS
Design
Repo
Other Tools and Expertise
Java, Typescript, SQL, Docker, Software Design
(OOP/Composition)