CS839

Reference Materials

Books/Courses: no textbooks are required. Please get familiar with the basic analysis tools in the books listed in the Prerequisite. The following books are additional ones, part of which may be used for or complementary to the lectures.

[T21] Matus Telgarsky, Deep learning theory lecture note, 2021-10-27 v0.0-e7150f2d.
[A+21] Sanjeev Arora et al., Theory of Deep learning book draft, 2021.

Some course materials from the following course will be used:

[G21] Quanquan Gu. CS269 Foundations of Deep Learning, Spring 2021.

We thank the authors for providing these excellent materials!

Papers:

[ZBHRV17] Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. "Understanding deep learning requires rethinking generalization." In International Conference on Learning Representations. 2017.
[SHNGS18] Soudry, Daniel, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. "The implicit bias of gradient descent on separable data." The Journal of Machine Learning Research 19, no. 1 (2018): 2822-2878.
[GLSS18] Gunasekar, Suriya, Jason Lee, Daniel Soudry, and Nathan Srebro. "Characterizing implicit bias in terms of optimization geometry." In International Conference on Machine Learning, pp. 1832-1841. PMLR, 2018.
[NLGSSS19] Nacson, Mor Shpigel, Jason Lee, Suriya Gunasekar, Pedro Henrique Pamplona Savarese, Nathan Srebro, and Daniel Soudry. "Convergence of gradient descent on separable data." In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3420-3428. PMLR, 2019.
[DZPS18] Du, Simon S., Xiyu Zhai, Barnabas Poczos, and Aarti Singh. "Gradient Descent Provably Optimizes Over-parameterized Neural Networks." In International Conference on Learning Representations. 2018.
[CB18] Chizat, Lénaïc, and Francis Bach. "On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport." Advances in Neural Information Processing Systems 31 (2018): 3036-3046.
[JGH18] Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: convergence and generalization in neural networks." Advances in Neural Information Processing Systems. 2018.
[FDZ19] Fang, Cong, Hanze Dong, and Tong Zhang. "Over parameterized two-level neural networks can learn near optimal feature representations." arXiv preprint arXiv:1910.11508 (2019).
[MMN18] Mei, Song, Andrea Montanari, and Phan-Minh Nguyen. "A mean field view of the landscape of two-layer neural networks." Proceedings of the National Academy of Sciences 115, no. 33 (2018): E7665-E7671.
[GL20] Garg, Siddhant, and Yingyu Liang. "Functional regularization for representation learning: A unified theoretical perspective." Advances in Neural Information Processing Systems 33 (2020): 17187-17199.
[AKKPS19] Arora, Sanjeev, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. "A theoretical analysis of contrastive unsupervised representation learning." In 36th International Conference on Machine Learning, ICML 2019, pp. 9904-9923. International Machine Learning Society (IMLS), 2019.
[SCLRWLJ23] Shi Zhenmei, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha. "The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning." In International Conference on Learning Representations. 2023.
[BR88] Blum, Avrim L., and Ronald L. Rivest. "Training a 3-node neural network is NP-complete." Neural Networks 5, no. 1 (1992): 117-127. Conference version appeared in Advances in neural information processing systems 1 (1988).
[SWL22] Shi, Zhenmei, Junyi Wei, and Yingyu Liang. "A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features." In International Conference on Learning Representations. 2022.

Schedule:

Date	Topic	Lecture notes/Reading materials	Assignments
Tuesday, Jan 24	Course Overview	Slides	HW1 released: LINK
Thursday, Jan 26	Challenges in Deep Learning Analysis	Slides, Lecture note; [ZBHRV17]
Tuesday, Feb 1	Approximation Power of Neural Networks I	Lecture note; CH2 of [T21]
Thursday, Feb 2	Approximation Power of Neural Networks II	Lecture note; CH2 of [T21]
Tuesday, Feb 7	Approximation Power of Neural Networks III	Lecture note; CH3 of [T21]
Thursday, Feb 10	Implicit Regularization of Gradient Descent I	Lecture note; CH9 of [A+21],[SHNGS18]
Tuesday, Feb 14	Implicit Regularization of Gradient Descent II	Lecture note; CH9 of [A+21],[GLSS18,NLGSSS19]	HW 2 released
Thursday, Feb 16	Implicit Regularization of Gradient Descent III	Lecture note; CH9 of [A+21],[GLSS18,NLGSSS19]
Tuesday, Feb 21	Clarke Subdifferential and Positive Homogeneity	Lecture note; CH9 of [T21]
Thursday, Feb 23	Implicit Regularization of Gradient Descent IV	Lecture note; CH10 of [T21]
Tuesday, Feb 28	Neural Tangent Kernel I	Lecture note; CH10 of [A+21], [DZPS19]
Thursday, Mar 2	Neural Tangent Kernel II	Lecture note; CH10 of [A+21], [DZPS19]
Tuesday, Mar 7	Neural Tangent Kernel III	Lecture note; CH8 of [T21], [CB18]	HW 3 released
Thursday, Mar 9	Mean Field Analysis I	Lecture note; [FDZ19], [MMN18]
Tuesday, Mar 21	Mean Field Analysis II	Lecture note; [FDZ19], [MMN18]
Thursday, Mar 23	Representation Learning I	Slides; Lecture note; [GL20]
Tuesday, Mar 28	Representation Learning II	Lecture note; [AKKPS19]	HW4 released
Thursday, Mar 30	Representation Learning III	Slides;Lecture note; [SCLRXLJ23]
Tuesday, Apr 4	Complexity I	Lecture note; [BR88]
Thursday, Apr 6	Complexity II	Lecture note; [SWL22]	HW 5 released
Apr 11 and afterwards	Paper Presentation