Mu Cai

Hi, I am a fourth-year Ph.D. student in Computer Sciences Department at University of Wisconsin-Madison, advised by Prof. Yong Jae Lee.

My research interest lies in the intersection of deep learning and computer vision. I am especially interested in multimodal generative models, visual prompting, video and 3D understanding.

Email  /  CV  /  GitHub  /  Google Scholar  /  LinkedIn /  Twitter (X) /  Blog

Recent talk on compositional vision-language models in the input sapce. [YouTube link]



Research

profile photo



NEW! LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang*, Mu Cai*, Bingxin Xu, Yong Jae Lee^, Yan Yan^
arXiv, 2024
(*equal contribution, ^equal advising)
[arXiv] [code] [Project Page]

NEW! CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
Jianrui Zhang*, Mu Cai*, Tengyang Xie, Yong Jae Lee
arXiv, 2024
(*equal contribution)
[arXiv] [code] [Project Page]

NEW! ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[arXiv] [code] [Demo] [Project Page]

NEW! Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Mu Cai*, Zeyi Huang*, Yuheng Li, Haohan Wang, and Yong Jae Lee
arXiv, 2023
(*equal contribution)
[arXiv] [code]

Investigating the catastrophic forgetting in multimodal large language models
Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
Conference on Parsimony and Learning (Proceedings Track) (CPAL), 2023
[arXiv]

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
Zeyi Huang, Andy Zhou, Zijian Ling,  Mu Cai, Haohan Wang, and Yong Jae Lee
Proceedings of International Conference on Computer Vision (ICCV), 2023
[arXiv]

Out-of-distribution Detection via Frequency-regularized Generative Models
Mu Cai, and Yixuan Li
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023 (Spotlight)
[arXiv] [code]

Masked Discrimination for Self-Supervised Learning on Point Clouds
Haotian Liu, Mu Cai, and Yong Jae Lee
Proceedings of the European Conference on Computer Vision (ECCV), 2022
[arXiv] [code] [talk]

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis
Xuefeng Du, Zhaoning Wang, Mu Cai, and Yixuan Li
Proceedings of the International Conference on Learning Representations (ICLR), 2022
[arXiv] [code]

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving
Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, and Gao Huang
In Proceedings of International Conference on Computer Vision (ICCV), 2021
[arXiv] [code]

A game-theoretic strategy-aware interaction algorithm with validation on real traffic data
Liting Sun*, Mu Cai*, Wei Zhan, and Masayoshi Tomizuka
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020
(*equal contribution)
[PDF]