Hi, I am a final-year Ph.D. candidate in Computer Sciences Department at University of Wisconsin-Madison, advised by Prof. Yong Jae Lee.
My recent research interest lies in multimodal generative models. I am especially interested in visual prompting, video and 3D understanding, and analyzing the limitations of CLIP.
Email / CV / GitHub / Google Scholar / LinkedIn / Twitter (X) / Blog
Recent talk on criticizing and creating vision-language models. [YouTube English, Chinese ]
I will graduate around 2025 May, looking for a Research Scientist position around multimodal models. Do not hesitate to shoot me an email if you are interested!