Zhenmei Shi

Member of Technical Staff at xAI



Contact: zhmeishi [at] gmail [dot] com

X (Twitter) | Google Scholar | Github | LinkedIn | CV | OpenReview


About Me

I am a Member of Technical Staff at xAI, working on Pretraining.

Previously, I was a Senior Research Scientist at MongoDB + Voyage AI, working with Tengyu Ma. I got my Computer Sciences Ph.D. degree at the University of Wisconsin-Madison, advised by Yingyu Liang, in 2024. I obtained my B.S. degree in Computer Science and Pure Mathematics Advanced from the Hong Kong University of Science and Technology in 2019.

Publications

* denotes equal contribution or alphabetical order.
Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning
Dechen Zhang, Zhenmei Shi, Yi Zhang, Yingyu Liang, Difan Zou

NeurIPS 2025
[ OpenReview ] [ arXiv ]
Circuit Complexity Bounds for RoPE-based Transformer Architecture
Bo Chen*, Xiaoyu Li*, Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Jiahao Zhang*

EMNLP 2025
[ OpenReview ] [ arXiv ]
Toward Infinite-Long Prefix in Transformer
Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Chiwun Yang*

EMNLP 2025
[ OpenReview ] [ arXiv ] [ Code ]
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*

ICCV 2025
[ OpenReview ] [ arXiv ]
Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
Yuefan Cao*, Xiaoyu Li*, Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Jiahao Zhang*

ICML 2025
[ OpenReview ] [ arXiv ]
Fundamental Limits of Visual Autoregressive Transformers: Universal Approximation Abilities
Yifang Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

ICML 2025
[ OpenReview ]
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

ICLR 2025
[ OpenReview ] [ arXiv ]
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

AIStats 2025
[ OpenReview ] [ arXiv ]
Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs
Chenyang Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Tianyi Zhou*

AIStats 2025
[ OpenReview ] [ Workshop ] [ arXiv ] [ Workshop Poster ]
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*

AIStats 2025
[ OpenReview ] [ arXiv ]
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

AIStats 2025
[ OpenReview ] [ arXiv ]
The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Yifang Chen*, Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*

CPAL 2025 Oral
[ OpenReview ] [ arXiv ]
Fast John Ellipsoid Computation with Differential Privacy Optimization
Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Junwei Yu*

CPAL 2025 Oral
[ OpenReview ] [ arXiv ]
Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond
Yekun Ke*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Chiwun Yang*

CPAL 2025
[ OpenReview ] [ arXiv ]
HSR-Enhanced Sparse Attention Acceleration
Bo Chen*, Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*

CPAL 2025
[ OpenReview ] [ arXiv ]
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty

arXiv, 2024
[ arXiv ] [ Code ]
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi

NeurIPS 2024
[ OpenReview ] [ arXiv ] [ Code ] [ Dataset ] [ Poster ]
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu*, Zhenmei Shi*, Yingyu Liang

COLM 2024
[ OpenReview ] [ arXiv ]
[ Workshop ] [ Code ] [ Slides ] [ Poster ]
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

ICML 2024
[ Openreview ] [ arXiv ] [ Poster ]
[ Workshop ] [ Workshop Poster ]
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang

ICLR 2024
[ OpenReview ] [ arXiv ] [ Code ] [ Slides ] [ Poster ] [ Video ]
[ Workshop ] [ Workshop Poster ] [ Workshop Slides ]
Domain Generalization via Nuclear Norm Regularization
Zhenmei Shi*, Yifei Ming*, Ying Fan*, Frederic Sala, Yingyu Liang

CPAL 2024 Oral
[ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ]
[ Workshop ] [ Workshop Poster ]
Provable Guarantees for Neural Networks via Gradient Feature Learning
Zhenmei Shi*, Junyi Wei*, Yingyu Liang

NeurIPS 2023
[ OpenReview ] [ arXiv ] [ Video ] [ Slides ] [ Poster ]
A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning
Yiyou Sun, Zhenmei Shi, Yixuan Li

NeurIPS 2023   Spotlight
[ OpenReview ] [ arXiv ] [ Video ] [ Code ] [ Slides ]
When and How Does Known Class Help Discover Unknown Ones? Provable Understandings Through Spectral Analysis
Yiyou Sun, Zhenmei Shi, Yingyu Liang, Yixuan Li

ICML 2023
[ OpenReview ] [ arXiv ] [ Video ] [ Code ]
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Zhenmei Shi*, Jiefeng Chen*, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha

ICLR 2023   Spotlight (Accept Rate: 7.95%)
[ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Video ]
[ Workshop ] [ Workshop Poster ]
A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features
Zhenmei Shi*, Junyi Wei*, Yingyu Liang

ICLR 2022
[ OpenReview ] [ arXiv ] [ Poster ] [ Code ] [ Slides ] [ Video ]
Attentive Walk-Aggregating Graph Neural Networks
Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, Zhenmei Shi, Yingyu Liang

TMLR 2022
[ OpenReview ] [ arXiv ] [ Code ]
Deep Online Fused Video Stabilization
Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, Yingyu Liang

WACV 2022
[ Paper ] [ arXiv ] [ Poster ] [ Project ] [ Code ] [ Dataset ]
Structured Feature Learning for End-to-End Continuous Sign Language Recognition
Zhaoyang Yang*, Zhenmei Shi*, Xiaoyong Shen, Yu-Wing Tai

arXiv, 2019
[ arXiv ] [ News ]

Research & Work Experience

Member of Technical Staff
xAI
2025 - Now
Senior Research Scientist
MongoDB
2025 | Tengyu Ma
Research Scientist
Voyage AI
2025 | Tengyu Ma
Research Intern
Google Cloud AI in Sunnyvale, CA
Fall 2024 | Sercan Arik
AI Research Scientist Intern
Salesforce in Palo Alto, CA
Summer 2024 | Shafiq Joty
Software Engineering Intern
Google in Mountain View, CA
Summer 2021 | Myra Nam
Summer 2020 | Fuhao Shi
Research Intern
Megvii (Face++) in Beijing
Summer 2019 | Xiangyu Zhang
Research Intern
Tencent YouTu in Shenzhen
Winter 2019 | Zhaoyang Yang and Yu-Wing Tai
Winter 2018 | Xin Tao and Yu-Wing Tai
Research Intern
Oak Ridge National Laboratory in the USA
Summer 2017

Academic Services

Conference Reviewer at ICLR 2022-2026, NeurIPS 2022-2025, ICML 2022 and 2024-2025, COLM 2025, AIStats 2025-2026, AAAI 2026, IJCAI 2025, CVPR 2021-2022 and 2025, ICCV 2021-2025, ECCV 2020-2022, WACV 2022 and 2025-2026
Journal Reviewer at JVCI, IEEE Transactions on Information Theory
This page has been accessed several times since April 02, 2022
Last updated: Oct 26, 2025