ORIE 7790 Selected Topics in Applied Statsitics (Spring 2020):
High Dimensional Probability and Statistics
1. Basic Info
2.
Course Overview
3. Text and Lecture
4. Homework and Projects
5. Websites and Communications
6. Grading
7. Lecture Notes
8. Academic Conduct
Lectures:
Tuesday and Thursday 10:10-11:25am, Hollister Hall 320
Instructor:
Yudong
Chen (yudong.chen at cornell dot edu, Rhodes 223)
Office hours: TTh 11:30-12:00 (after the lecture), or by appointment
Prerequisites:
There is no
formal
prerequisite. Students should have a phd level of mathematical
maturity, including a background in basic linear algebra, probability
and algorithms. Prior exposure to machine learning, statistical
inference, stochastic processes and convex/continous optimization is
helpful, but not required.
This is a fast-paced course on the probability and statistical
tools for
high-dimensional data analysis. In particular, we will develop
technique for analyzing the performance of an algorithm, as wells for
understanding the fundamental limits of a problem. Focus will be on the
high-dimenisonal problems that possess hiden low-dimensional
structuers, and on non-asymptotic anaysis that characterizes the
interaction between sample complexity, problem dimension and other
structural parameters.
Tentative list of topics:
- • Tail bounds and concentration of measure
- • Random vectors in high dimensions
- • Quadratic forms, symmetrization and contraction
- • Empirical processes: maxima, uniform laws and metric entropy
- • Random matrices
- • Sparse regression in high dimensions
- • Covariance estimation and principal component analysis in high dimensions
- • Low-rank matrix estimation and factorization
- • Reproducing kernel Hilbert spaces and non-parametric regression
- • Minimax lower bounds: Fano, Le Cam, and Assouad Methods
- • Exponential families and information geometry
- • Online convex optimization and bandits
- • Statistical methods based on non-convex optimization
There is no required text. Notes will be posted, but I may
rely on a
scribe during some lectures. Template for scribing can be downloaded here.
We will sometimes draw from the following books and notes:
- • High-dimensional
statistics: A non-asymptotic viewpoint, Martin J. Wainwright,
Cambridge University Press, 2019.
• High-dimensional probability: An introduction with applications in data science, Roman Vershynin, Cambridge University Press, 2018. - • Lecture notes for Statistics 311/Electrical Engineering 377: Information Theory and Statiscs, John Duchi, 2019.
- • Probability in High Dimension, Ramon van Handel, 2016.
- • An Introduction to Matrix Concentration Inequalities, Joel Tropp, Foundations and Trends in Machine Learning, 2015.
- • Graphical models, exponential families, and variational inference, Martin Wainwright, and Michael Jordan, Foundations and Trends in Machine Learning, 2008.
- • Introduction to the non-asymptotic analysis of random matrices, Roman Vershynin, Compressed Sensing: Theory and Applications, 2010.
- • Concentration Inequalities: A Nonasymptotic Theory of Independence, Boucheron, Lugosi, and Massart.
- • High-dimensional data analysis with sparse models: Theory, algorithms, and applications, John Wright, Yi Ma, Allen Yang, 2018.
- • Statistical machine learning for high-dimensional data, Jianqing Fan, Runze Li, Cun-Hui Zhang, Hui Zou, 2018.
There will be approximately 3 homework assignments. You are encourage to discuss and work together on the homework. However, you must write up your homework alone, AND acknowledge those with whom you discussed with. You must also cite any resources which helped you obtain your solution.
There will also be a final project, to be completed individually or in groups of two. The project can be any of the following:
- • Literature review: Critical summary of one or several papers related to the topics studied.
- • Original research: It can be either theoretic or experimental (ideally a mix of the two).
We particularly welcome projects that may be extended for submission to a peer-reviewed journal or conference (e.g., MOR/AoS/T-IT/COLT/ICML/NeurIPS/ICLR). Project topics must be approved by the instructor.
Project instructions can be found here.
- • Canvas:
We use Canvas
for communicatoin and posting course
materials.
- • Piazza: We
will have a class Piazza forum where students can discuss the course
content. Sign up for this course on Piazza using this
link.
Your final grade will be based on the following:
- • 35%: Homework
- • 60%: Final project
- • 5%: Scribing, participation in class/Piazza, filling out the course evaluation.
- • Basic Tail Bounds
- • Random Vectors in High Dimension
- • Concentration for Lipschitz Functions
- • Random Matrices I: Comparison Inequalities
- • Random Matrices II: Epsilon-Net
- • Random Matrices III: Matrix Bernstein
- • Random Processes and Metric Entropy
- • Random Processes and Chaining
- • Statistical Learning Theory
- • Nonparametric Regression
- • Sparse Regression
- • Minimax Lower Bounds I: Local Fano's Method
- • Minimax Lower Bounds II: Application of Local Fano's Method
- • Minimax Lower Bounds III: Global Fano's Method
- • Online Learning
- • Uniform Laws and Localization
- • Overparametrization and Double Descent
Each student in this course is expected to abide by the Cornell University Code of Academic Integrity (http://theuniversityfaculty.cornell.edu/academic-integrity/). Any work submitted by a student in this course for academic credit should be the own work of the student (or of the project group). Copying homework is an example of cheating.
If you have any questions about this policy, please do not hesitate to contact the instructor.