Zhenmei ShiMember of Technical Staff at xAI
X (Twitter) | Google Scholar | Github | LinkedIn | CV | OpenReview |
|
I am a Member of Technical Staff at xAI, working on Pretraining.
Previously, I was a Senior Research Scientist at MongoDB + Voyage AI, working with Tengyu Ma. I got my Computer Sciences Ph.D. degree at the University of Wisconsin-Madison, advised by Yingyu Liang, in 2024. I obtained my B.S. degree in Computer Science and Pure Mathematics Advanced from the Hong Kong University of Science and Technology in 2019.
|
Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning
Dechen Zhang, Zhenmei Shi, Yi Zhang, Yingyu Liang, Difan Zou NeurIPS 2025 [ OpenReview ] [ arXiv ] |
|
Circuit Complexity Bounds for RoPE-based Transformer Architecture
Bo Chen*, Xiaoyu Li*, Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Jiahao Zhang* EMNLP 2025 [ OpenReview ] [ arXiv ] |
|
Toward Infinite-Long Prefix in Transformer
Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Chiwun Yang* EMNLP 2025 [ OpenReview ] [ arXiv ] [ Code ] |
|
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou* ICCV 2025 [ OpenReview ] [ arXiv ] |
|
Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
Yuefan Cao*, Xiaoyu Li*, Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Jiahao Zhang* ICML 2025 [ OpenReview ] [ arXiv ] |
|
Fundamental Limits of Visual Autoregressive Transformers: Universal Approximation Abilities
Yifang Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song* ICML 2025 [ OpenReview ] |
|
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou* ICLR 2025 [ OpenReview ] [ arXiv ] |
|
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song* AIStats 2025 [ OpenReview ] [ arXiv ] |
|
Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs
Chenyang Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Tianyi Zhou* AIStats 2025 [ OpenReview ] [ Workshop ] [ arXiv ] [ Workshop Poster ] |
|
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou* AIStats 2025 [ OpenReview ] [ arXiv ] |
|
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song* AIStats 2025 [ OpenReview ] [ arXiv ] |
|
The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Yifang Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song* CPAL 2025 Oral [ OpenReview ] [ arXiv ] |
|
Fast John Ellipsoid Computation with Differential Privacy Optimization
Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Junwei Yu* CPAL 2025 Oral [ OpenReview ] [ arXiv ] |
|
Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and
Beyond
Yekun Ke*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Chiwun Yang* CPAL 2025 [ OpenReview ] [ arXiv ] |
|
HSR-Enhanced Sparse Attention Acceleration
Bo Chen*, Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song* CPAL 2025 [ OpenReview ] [ arXiv ] |
|
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty arXiv, 2024 [ arXiv ] [ Code ] |
|
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi NeurIPS 2024 [ OpenReview ] [ arXiv ] [ Code ] [ Dataset ] [ Poster ] |
|
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu*, Zhenmei Shi*, Yingyu Liang COLM 2024 [ OpenReview ] [ arXiv ] [ Workshop ] [ Code ] [ Slides ] [ Poster ] |
|
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang ICML 2024 [ Openreview ] [ arXiv ] [ Poster ] [ Workshop ] [ Workshop Poster ] |
|
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang ICLR 2024 [ OpenReview ] [ arXiv ] [ Code ] [ Slides ] [ Poster ] [ Video ] [ Workshop ] [ Workshop Poster ] [ Workshop Slides ] |
|
Domain Generalization via Nuclear Norm Regularization
Zhenmei Shi*, Yifei Ming*, Ying Fan*, Frederic Sala, Yingyu Liang CPAL 2024 Oral [ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Workshop ] [ Workshop Poster ] |
|
Provable Guarantees for Neural Networks via Gradient Feature Learning
Zhenmei Shi*, Junyi Wei*, Yingyu Liang NeurIPS 2023 [ OpenReview ] [ arXiv ] [ Video ] [ Slides ] [ Poster ] |
|
A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning
Yiyou Sun, Zhenmei Shi, Yixuan Li NeurIPS 2023 Spotlight [ OpenReview ] [ arXiv ] [ Video ] [ Code ] [ Slides ] |
|
When and How Does Known Class Help Discover Unknown Ones? Provable Understandings Through
Spectral Analysis
Yiyou Sun, Zhenmei Shi, Yingyu Liang, Yixuan Li ICML 2023 [ OpenReview ] [ arXiv ] [ Video ] [ Code ] |
|
The Trade-off between Universality and Label Efficiency of Representations from Contrastive
Learning
Zhenmei Shi*, Jiefeng Chen*, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha ICLR 2023 Spotlight (Accept Rate: 7.95%) [ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Video ] [ Workshop ] [ Workshop Poster ] |
|
A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and
Advantage over Fixed
Features
Zhenmei Shi*, Junyi Wei*, Yingyu Liang ICLR 2022 [ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Video ] |
|
Attentive Walk-Aggregating Graph Neural Networks
Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, Zhenmei Shi, Yingyu Liang TMLR 2022 [ OpenReview ] [ arXiv ] [ Code ] |
|
Deep Online Fused Video Stabilization
Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, Yingyu Liang WACV 2022 [ Paper ] [ arXiv ] [ Poster ] [ Project ] [ Code ] [ Dataset ] |
|
Structured Feature Learning for End-to-End Continuous Sign Language Recognition Zhaoyang Yang*, Zhenmei Shi*, Xiaoyong Shen, Yu-Wing Tai arXiv, 2019 [ arXiv ] [ News ] |
|
Member of Technical Staff
xAI 2025 - Now |
|
Senior Research Scientist
MongoDB 2025 | Tengyu Ma |
|
Research Scientist
Voyage AI 2025 | Tengyu Ma |
|
Research Intern
Google Cloud AI in Sunnyvale, CA Fall 2024 | Sercan Arik |
|
AI Research Scientist Intern
Salesforce in Palo Alto, CA Summer 2024 | Shafiq Joty |
|
Software Engineering Intern
Google in Mountain View, CA Summer 2021 | Myra Nam Summer 2020 | Fuhao Shi |
|
Research Intern
Megvii (Face++) in Beijing Summer 2019 | Xiangyu Zhang |
|
Research Intern
Tencent YouTu in Shenzhen Winter 2019 | Zhaoyang Yang and Yu-Wing Tai Winter 2018 | Xin Tao and Yu-Wing Tai |
|
Research Intern
Oak Ridge National Laboratory in the USA Summer 2017 |