Yiqiao Zhong

not available 

Department of Statistics,
University of Wisconsin–Madison
Medical Science Center 1122
1300 University Ave, Madison, WI 53706
E-mail: yiqiao.zhong [@] wisc [DOT] edu

Bio

I am a tenure-track assistant professor in the Department of Statistics at the University of Wisconsin–Madison. I started my appointment from Fall 2022. My research is primarily motivated by advances in data science. I enjoy working on modern statistics and machine learning problems, especially deep learning theory and high-dimensional statistics.


My recent research interests are the scientific foundations of generative AI, especially Large Language Models (LLMs). I am interested in developing better evaluation, interpretability, and theory for this emerging technology. I am also interested in improving model adaptation and alignment.


Previously, I was a postdoc at Stanford University, as a part of Collaboration on the Theoretical Foundations of Deep Learning, where I was advised by Prof. Andrea Montanari and Prof. David Donoho. Prior to this, I obtained my Ph.D. in 2019 from Princeton University, where I was advised by Prof. Jianqing Fan. I received my B.S. in mathematics from Peking University in 2014.

Research agenda

Large Language Models. Generative AI has been accelerated by the industry in the past two years, yet a critical gap widens between its practical use and its scientific foundations. The ever accelerating advances in this field show great promises, yet LLMs often show unexpected behaviors. The aim of my research is to provide rigorous measurements and deeper understanding to improve enhancing model evaluation, interpretability, and safety.

  • In a recent paper published in Proceedings of National Academy of Sciences, we studied how LLMs generalize to distributions they have not seen during training (known as out-of-distribution or OOD generalization). A crucial finding is that there is a sharp transition in training dynamics where two self-attention layers suddenly aligned, forming a compositional structure that enables OOD generalization. It suggests that LLMs rely on intermediate latent subspaces — which we call bridge subspaces — to represent compositions. See my blog here.

  • In another paper, we examined various pretrained Transformer models and explored the hidden geometry inside these black-box models. The interesting geometric structure seems to contain many stories to be told! See my blog here.


Statistical foundation of Deep learning. A fundamental question in modern machine learning (e.g.,deep learning) is the generalization properties of complex and over-parametrized models. This impressive empirical performance of deep networks has driven active research in the past few years. A useful source of introduction to deep learning can be found in a course I co-instructed in 2019 (course link).

An excellent paper surveys recent progress for the statistical foundations of deep learning. I presented a brief introduction to several key ideas in a lecture for CS762 in October 2022. You can find my slides here.


Related research topics. Some other related research topics I am interested in:

  • Self-supervised learning, especially contrastive learning (e.g., A recent paper)

  • Data visualization


Older projects. Several older projects:

  • Spectral methods, PCA and factor models

  • Statistical networks, matrix completion and synchronization problems

  • Nonconvex optimization and SDP relaxation

  • Eigenvector perturbation analysis, entrywise/ell_infty bounds


My Google Scholar profile.

Interested in working with me?

I am looking for motivated students (statistics, applied math, CS, etc.) to work on any aspect of statistics, machine learning, or applied problems. I'd be happy to chat if you want to learn about my research, start working on a research project, or look for summer internship. The best way to reach out to me is through emails.