Zhenmei Shi

Senior Research Scientist



Contact: zhmeishi [at] gmail [dot] com

Google Scholar | Github | LinkedIn | CV | OpenReview


About Me

I am a Senior Research Scientist in MongoDB + Voyage AI, working with Tengyu Ma. I am currently working on Retrieval Augmented Generation (RAG).

I got my Computer Sciences Ph.D. degree at the University of Wisconsin-Madison, advised by Yingyu Liang, in 2024. I obtained my B.S. degree in Computer Science and Pure Mathematics Advanced, from the Hong Kong University of Science and Technology in 2019.

My PhD Thesis mainly focuses on understanding the learning and adaptation of Foundation Models, including Large Language Models, Vision Language Models, Diffusion Models, Shallow Networks, and so on.

I was a Research Intern at Google Cloud AI, Sunnyvale, working with Sercan Arik. I was an AI Research Scientist Intern at Salesforce, Palo Alto, working with Shafiq Joty. I also worked with Zhao Song at Seattle.

Publications

* denotes equal contribution or alphabetical order.
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

ICLR 2025
[ OpenReview ] [ arXiv ]
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

AIStats 2025
[ OpenReview ] [ arXiv ]
Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs
Chenyang Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Tianyi Zhou*

AIStats 2025
[ OpenReview ] [ Workshop ] [ arXiv ] [ Workshop Poster ]
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

AIStats 2025
[ OpenReview ] [ arXiv ]
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

AIStats 2025
[ OpenReview ] [ arXiv ]
Differential Privacy Mechanisms in Neural Tangent Kernel Regression
Jiuxiang Gu*, Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*

WACV 2025
[ arXiv ]
The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Yifang Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

CPAL 2025 Oral
[ OpenReview ] [ arXiv ]
Fast John Ellipsoid Computation with Differential Privacy Optimization
Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Junwei Yu*

CPAL 2025 Oral
[ OpenReview ] [ arXiv ]
Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond
Yekun Ke*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Chiwun Yang*

CPAL 2025
[ OpenReview ] [ arXiv ]
HSR-Enhanced Sparse Attention Acceleration
Bo Chen*, Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*

CPAL 2025
[ OpenReview ] [ arXiv ]
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty

arXiv, 2024
[ arXiv ] [ Code ]
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi

NeurIPS 2024
[ OpenReview ] [ arXiv ] [ Code ] [ Dataset ] [ Poster ]
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

NeurIPS 2024 Workshop
[ OpenReview ] [ arXiv ] [ Poster ]
Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers
Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

NeurIPS 2024 Workshop
[ OpenReview ] [ arXiv ] [ Poster ]
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

NeurIPS 2024 Workshop
[ OpenReview ] [ arXiv ] [ Poster ]
Differential Privacy of Cross-Attention with Provable Guarantee
Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

NeurIPS 2024 Workshop
[ OpenReview ] [ arXiv ] [ Poster ]
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu*, Zhenmei Shi*, Yingyu Liang

COLM 2024
[ OpenReview ] [ arXiv ]
[ Workshop ] [ Code ] [ Slides ] [ Poster ]
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

ICML 2024
[ Openreview ] [ arXiv ] [ Poster ]
[ Workshop ] [ Workshop Poster ]
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang

ICLR 2024
[ OpenReview ] [ arXiv ] [ Code ] [ Slides ] [ Poster ] [ Video ]
[ Workshop ] [ Workshop Poster ] [ Workshop Slides ]
Domain Generalization via Nuclear Norm Regularization
Zhenmei Shi*, Yifei Ming*, Ying Fan*, Frederic Sala, Yingyu Liang

CPAL 2024 Oral
[ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ]
[ Workshop ] [ Workshop Poster ]
Provable Guarantees for Neural Networks via Gradient Feature Learning
Zhenmei Shi*, Junyi Wei*, Yingyu Liang

NeurIPS 2023
[ OpenReview ] [ arXiv ] [ Video ] [ Slides ] [ Poster ]
A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning
Yiyou Sun, Zhenmei Shi, Yixuan Li

NeurIPS 2023   Spotlight
[ OpenReview ] [ arXiv ] [ Video ] [ Code ] [ Slides ]
When and How Does Known Class Help Discover Unknown Ones? Provable Understandings Through Spectral Analysis
Yiyou Sun, Zhenmei Shi, Yingyu Liang, Yixuan Li

ICML 2023
[ OpenReview ] [ arXiv ] [ Video ] [ Code ]
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Zhenmei Shi*, Jiefeng Chen*, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha

ICLR 2023   Spotlight (Accept Rate: 7.95%)
[ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Video ]
[ Workshop ] [ Workshop Poster ]
A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features
Zhenmei Shi*, Junyi Wei*, Yingyu Liang

ICLR 2022
[ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Video ]
Attentive Walk-Aggregating Graph Neural Networks
Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, Zhenmei Shi, Yingyu Liang

TMLR 2022
[ OpenReview ] [ arXiv ] [ Code ]
Deep Online Fused Video Stabilization
Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, Yingyu Liang

WACV 2022
[ Paper ] [ arXiv ] [ Poster ] [ Project ] [ Code ] [ Dataset ]
Structured Feature Learning for End-to-End Continuous Sign Language Recognition
Zhaoyang Yang*, Zhenmei Shi*, Xiaoyong Shen, Yu-Wing Tai

arXiv, 2019
[ arXiv ] [ News ]

Research & Work Experience

Senior Research Scientist
MongoDB
2025 - Now | Tengyu Ma
Research Scientist
Voyage AI
2025 | Tengyu Ma
Research Assistant
University of Wisconsin-Madison
2019 - 2024 | Yingyu Liang
Research Intern
Google Cloud AI in Sunnyvale, CA
Fall 2024 | Sercan Arik
AI Research Scientist Intern
Salesforce in Palo Alto, CA
Summer 2024 | Shafiq Joty
Software Engineering Intern
Google in Mountain View, CA
Summer 2021 | Myra Nam
Summer 2020 | Fuhao Shi
Research Intern
Megvii (Face++) in Beijing
Summer 2019 | Xiangyu Zhang
Research Intern
Tencent YouTu in Shenzhen
Winter 2019 | Zhaoyang Yang and Yu-Wing Tai
Winter 2018 | Xin Tao and Yu-Wing Tai
Research Assistant
Hong Kong University of Science and Technology
2018 - 2019 | Chi Keung Tang
2017 - 2018 | Raymond Wong
Summer 2016 | Ji-Shan Hu
Research Intern
Oak Ridge National Laboratory in the USA
Summer 2017 | Cheng Liu and Kwai L. Wong

Academic Services

Conference Reviewer at ICLR 2022-2025, NeurIPS 2022-2025, ICML 2022 and 2024-2025, COLM 2025, AIStats 2025, CVPR 2021-2022 and 2025, ICCV 2021-2025, ECCV 2020-2022, WACV 2022 and 2025
Journal Reviewer at JVCI, IEEE Transactions on Information Theory

Teaching

Teaching Assistant of CS220 (Data Programming I) at UW-Madison (Spring 2020)
Teaching Assistant of CS301 (Intro to Data Programming) at UW-Madison (Fall 2019)
This page has been accessed several times since April 02, 2022
Last updated: Feb 24, 2025