ORIE 7790 Selected Topics in Applied Statsitics (Spring 2020):
High Dimensional Probability and Statistics

1. Basic Info
2. Course Overview
3. Text and Lecture
4. Homework and Projects
5. Websites and Communications
6. Grading
7. Lecture Notes
8. Academic Conduct

1. Basic Info

Tuesday and Thursday 10:10-11:25am, Hollister Hall 320

Yudong Chen (yudong.chen at cornell dot edu, Rhodes 223)
Office hours: TTh 11:30-12:00 (after the lecture), or by appointment

There is no formal prerequisite. Students should have a phd level of mathematical maturity, including a background in basic linear algebra, probability and algorithms. Prior exposure to machine learning, statistical inference, stochastic processes and convex/continous optimization is helpful, but not required.

2. Course Overview

This is a fast-paced course on the probability and statistical tools for high-dimensional data analysis. In particular, we will develop technique for analyzing the performance of an algorithm, as wells for understanding the fundamental limits of a problem. Focus will be on the high-dimenisonal problems that possess hiden low-dimensional structuers, and on non-asymptotic anaysis that characterizes the interaction between sample complexity, problem dimension and other structural parameters.

Tentative list of topics:

  • • Tail bounds and concentration of measure
  • • Random vectors in high dimensions
  • • Quadratic forms, symmetrization and contraction
  • • Empirical processes: maxima, uniform laws and metric entropy
  • • Random matrices
  • • Sparse regression in high dimensions
  • • Covariance estimation and principal component analysis in high dimensions
  • • Low-rank matrix estimation and factorization
  • • Reproducing kernel Hilbert spaces and non-parametric regression
  • • Minimax lower bounds: Fano, Le Cam, and Assouad Methods
  • • Exponential families and information geometry
  • • Online convex optimization and bandits
  • • Statistical methods based on non-convex optimization

3. Text and Lecture

There is no required text. Notes will be posted, but I may rely on a scribe during some lectures. Template for scribing can be downloaded here.

We will sometimes draw from the following books and notes:

4. Homework and Projects

There will be approximately 3 homework assignments. You are encourage to discuss and work together on the homework. However, you must write up your homework alone, AND acknowledge those with whom you discussed with. You must also cite any resources which helped you obtain your solution.

There will also be a final project, to be completed individually or in groups of two. The project can be any of the following:

  • Literature review: Critical summary of one or several papers related to the topics studied.
  • Original research: It can be either theoretic or experimental (ideally a mix of the two). 

We particularly welcome projects that may be extended for submission to a peer-reviewed journal or conference (e.g., MOR/AoS/T-IT/COLT/ICML/NeurIPS/ICLR). Project topics must be approved by the instructor.

Project instructions can be found here.

5. Websites and Communication

  • Canvas: We use Canvas for communicatoin and posting course materials.
  • Piazza: We will have a class Piazza forum where students can discuss the course content. Sign up for this course on Piazza using this link.

6. Grading

Your final grade will be based on the following:

  • 35%: Homework
  • 60%: Final project
  • • 5%: Scribing, participation in class/Piazza, filling out the course evaluation.

7. Lecture Notes

8. Academic Conduct

Each student in this course is expected to abide by the Cornell University Code of Academic Integrity (http://theuniversityfaculty.cornell.edu/academic-integrity/). Any work submitted by a student in this course for academic credit should be the own work of the student (or of the project group).  Copying homework is an example of cheating.

If you have any questions about this policy, please do not hesitate to contact the instructor.