Wenhui "Alicia" Lyu

Research in Database Indexing

Indexing Join Inputs for Fast Queries and Maintenance

May–Oct, 2024
Wenhui Lyu and Goetz Graefe Accepted to BTW 2025 (Bamberg, Germany)
Condensed abstract: Many indexing techniques have been proposed to aid even the simplest binary joins. Traditional indexes on both join input tables by the join key offers efficient maintenance and reasonable merge-join performance. Traditional materialized join view indexed by the join key offers the best query performance but incurs high maintenance overhead and requires large space.
This work shows that merged index is a hybrid of both in its index design and enjoys the best of both worlds. Our comparison is done across 189 experiments in a wide range of configurations using both b-trees and LSM-forests. They show that the merged index sustains update rates 100% as high as the traditional single-table indexes and query rates 100%–250% as high as the materialized join view.

Storing and Indexing Multiple Tables by an Interesting Order

For efficient complex joins, grouping, and updates in relational databases
Wenhui Lyu and Goetz Graefe Feb 2025–Present
 Repo of experiments C++ 17, Linux, Google Cloud, LLDB/GDB   Query Execution, Indexing and Physical Database Design, KV Store (LeanStore and RocksDB)

Education

2022–2027/2028 (Expected)

Computer Science, University of Wisconsin–Madison

2025–Present, PhD 2022–2024, M.S. Computer Science, GPA: 3.95/4.0

Advisors: Dr. Goetz Graefe, Prof. AnHai Doan Field: Database Systems
Coursework: CS764 Topics in DBMS (Top of Class), CS736 Advanced OS, CS744 Big Data Systems
Activities: Graduate Advising Board, Graduate Orientation Speaker

2018–2022

B.A., Philosophy, Politics, and Economics, Peking University

Thesis: Kant's Theory of Space Advisor: Prof. Zengding Wu GPA: 3.81/4.0 (Top 10%)
Selected Awards: National Scholarship (Top 0.2% nationwide), Dean's List (2019, 2020, 2021)

Course Projects in ML Systems and DB Systems

An SPJ Query Processor Prototype with Efficient External Merge-Sort

Leading contributor, group of 2 Mar–Apr 2024
Implemented an external merge-sort algorithm with graceful spilling that sorts 120 GB data around 40 minutes using buffer pool of only 100 MB. Repo
C++, MacOS

Towards System Reproducibility in Multi-GPU LLM Inference

Equal contributor, group of 2 Sep–Dec 2024
Developed and open-sourced a profiling toolchain to analyze multi-GPU LLM inference at layer-level granularity, reducing search space for variability by 95%. Repo
PyTorch, Linux

Async Scheduling for Media Processing in Neural Network Inference

Leading contributor, group of 4 Mar–May 2024
Minimized context-switch overhead and optimized media processing time by over 30%.
Skills: Python multi-processing, NVIDIA Triton GPU scheduler, OS Design Repo

Other Tools and Expertise

Java, Typescript, SQL, Docker, Software Design (OOP/Composition)