Hi, I am a fifth-year Ph.D. candidate in Computer Sciences Department at University of Wisconsin-Madison, advised by Prof. Yong Jae Lee.
My recent research interest lies in the applications and fundamental limitations of multimodal generative models. I am especially interested in visual prompting, video and 3D understanding, and analyzing the limitations of CLIP.
Email / CV / GitHub / Google Scholar / LinkedIn / Twitter (X) / Blog
Recent talk on criticizing and creating vision-language models. [YouTube English, Chinese ]