ORIE 7790 (Spring 2020)

ORIE 7790 Selected Topics in Applied Statsitics (Spring 2020):
High Dimensional Probability and Statistics

1. Basic Info
2. Course Overview
3. Text and Lecture
4. Homework and Projects
5. Websites and Communications
6. Grading
7. Lecture Notes
8. Academic Conduct

1. Basic Info

Lectures:
Tuesday and Thursday 10:10-11:25am, Hollister Hall 320

Instructor:
Yudong Chen (yudong.chen at cornell dot edu, Rhodes 223)
Office hours: TTh 11:30-12:00 (after the lecture), or by appointment

Prerequisites:
There is no formal prerequisite. Students should have a phd level of mathematical maturity, including a background in basic linear algebra, probability and algorithms. Prior exposure to machine learning, statistical inference, stochastic processes and convex/continous optimization is helpful, but not required.

2. Course Overview

This is a fast-paced course on the probability and statistical tools for high-dimensional data analysis. In particular, we will develop technique for analyzing the performance of an algorithm, as wells for understanding the fundamental limits of a problem. Focus will be on the high-dimenisonal problems that possess hiden low-dimensional structuers, and on non-asymptotic anaysis that characterizes the interaction between sample complexity, problem dimension and other structural parameters.

Tentative list of topics:

• Tail bounds and concentration of measure
• Random vectors in high dimensions
• Quadratic forms, symmetrization and contraction
• Empirical processes: maxima, uniform laws and metric entropy
• Random matrices
• Sparse regression in high dimensions
• Covariance estimation and principal component analysis in high dimensions
• Low-rank matrix estimation and factorization
• Reproducing kernel Hilbert spaces and non-parametric regression
• Minimax lower bounds: Fano, Le Cam, and Assouad Methods
• Exponential families and information geometry
• Online convex optimization and bandits
• Statistical methods based on non-convex optimization

3. Text and Lecture

There is no required text. Notes will be posted, but I may rely on a scribe during some lectures. Template for scribing can be downloaded here.

We will sometimes draw from the following books and notes:

• High-dimensional statistics: A non-asymptotic viewpoint, Martin J. Wainwright, Cambridge University Press, 2019.
• High-dimensional probability: An introduction with applications in data science, Roman Vershynin, Cambridge University Press, 2018.
• Lecture notes for Statistics 311/Electrical Engineering 377: Information Theory and Statiscs, John Duchi, 2019.
• Probability in High Dimension, Ramon van Handel, 2016.
• An Introduction to Matrix Concentration Inequalities, Joel Tropp, Foundations and Trends in Machine Learning, 2015.
• Graphical models, exponential families, and variational inference, Martin Wainwright, and Michael Jordan, Foundations and Trends in Machine Learning, 2008.
• Introduction to the non-asymptotic analysis of random matrices, Roman Vershynin, Compressed Sensing: Theory and Applications, 2010.
• Concentration Inequalities: A Nonasymptotic Theory of Independence, Boucheron, Lugosi, and Massart.
• High-dimensional data analysis with sparse models: Theory, algorithms, and applications, John Wright, Yi Ma, Allen Yang, 2018.
• Statistical machine learning for high-dimensional data, Jianqing Fan, Runze Li, Cun-Hui Zhang, Hui Zou, 2018.

4. Homework and Projects

There will be approximately 3 homework assignments. You are encourage to discuss and work together on the homework. However, you must write up your homework alone, AND acknowledge those with whom you discussed with. You must also cite any resources which helped you obtain your solution.

There will also be a final project, to be completed individually or in groups of two. The project can be any of the following:

• Literature review: Critical summary of one or several papers related to the topics studied.
• Original research: It can be either theoretic or experimental (ideally a mix of the two).

We particularly welcome projects that may be extended for submission to a peer-reviewed journal or conference (e.g., MOR/AoS/T-IT/COLT/ICML/NeurIPS/ICLR). Project topics must be approved by the instructor.

Project instructions can be found here.

5. Websites and Communication

• Canvas: We use Canvas for communicatoin and posting course materials.
• Piazza: We will have a class Piazza forum where students can discuss the course content. Sign up for this course on Piazza using this link.

6. Grading

Your final grade will be based on the following:

• 35%: Homework
• 60%: Final project
• 5%: Scribing, participation in class/Piazza, filling out the course evaluation.

7. Lecture Notes

• Basic Tail Bounds
• Random Vectors in High Dimension
• Concentration for Lipschitz Functions
• Random Matrices I: Comparison Inequalities
• Random Matrices II: Epsilon-Net
• Random Matrices III: Matrix Bernstein
• Random Processes and Metric Entropy
• Random Processes and Chaining
• Statistical Learning Theory
• Nonparametric Regression
• Sparse Regression
• Minimax Lower Bounds I: Local Fano's Method
• Minimax Lower Bounds II: Application of Local Fano's Method
• Minimax Lower Bounds III: Global Fano's Method
• Online Learning
• Uniform Laws and Localization
• Overparametrization and Double Descent

8. Academic Conduct

Each student in this course is expected to abide by the Cornell University Code of Academic Integrity (http://theuniversityfaculty.cornell.edu/academic-integrity/). Any work submitted by a student in this course for academic credit should be the own work of the student (or of the project group). Copying homework is an example of cheating.

If you have any questions about this policy, please do not hesitate to contact the instructor.

ORIE 7790 Selected Topics in Applied Statsitics (Spring 2020): High Dimensional Probability and Statistics

ORIE 7790 Selected Topics in Applied Statsitics (Spring 2020):
High Dimensional Probability and Statistics