CS839 Theoretical Foundations of Deep Learning

CS839, Spring 2023
Department of Computer Sciences
University of Wisconsin–Madison


Reference Materials

Books/Courses: no textbooks are required. Please get familiar with the basic analysis tools in the books listed in the Prerequisite. The following books are additional ones, part of which may be used for or complementary to the lectures.

Some course materials from the following course will be used: We thank the authors for providing these excellent materials!

Papers:

  • [ZBHRV17] Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. "Understanding deep learning requires rethinking generalization." In International Conference on Learning Representations. 2017.
  • [SHNGS18] Soudry, Daniel, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. "The implicit bias of gradient descent on separable data." The Journal of Machine Learning Research 19, no. 1 (2018): 2822-2878.
  • [GLSS18] Gunasekar, Suriya, Jason Lee, Daniel Soudry, and Nathan Srebro. "Characterizing implicit bias in terms of optimization geometry." In International Conference on Machine Learning, pp. 1832-1841. PMLR, 2018.
  • [NLGSSS19] Nacson, Mor Shpigel, Jason Lee, Suriya Gunasekar, Pedro Henrique Pamplona Savarese, Nathan Srebro, and Daniel Soudry. "Convergence of gradient descent on separable data." In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3420-3428. PMLR, 2019.
  • [DZPS18] Du, Simon S., Xiyu Zhai, Barnabas Poczos, and Aarti Singh. "Gradient Descent Provably Optimizes Over-parameterized Neural Networks." In International Conference on Learning Representations. 2018.
  • [CB18] Chizat, Lénaïc, and Francis Bach. "On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport." Advances in Neural Information Processing Systems 31 (2018): 3036-3046.
  • [JGH18] Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: convergence and generalization in neural networks." Advances in Neural Information Processing Systems. 2018.
  • [FDZ19] Fang, Cong, Hanze Dong, and Tong Zhang. "Over parameterized two-level neural networks can learn near optimal feature representations." arXiv preprint arXiv:1910.11508 (2019).
  • [MMN18] Mei, Song, Andrea Montanari, and Phan-Minh Nguyen. "A mean field view of the landscape of two-layer neural networks." Proceedings of the National Academy of Sciences 115, no. 33 (2018): E7665-E7671.
  • [GL20] Garg, Siddhant, and Yingyu Liang. "Functional regularization for representation learning: A unified theoretical perspective." Advances in Neural Information Processing Systems 33 (2020): 17187-17199.
  • [AKKPS19] Arora, Sanjeev, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. "A theoretical analysis of contrastive unsupervised representation learning." In 36th International Conference on Machine Learning, ICML 2019, pp. 9904-9923. International Machine Learning Society (IMLS), 2019.
  • [SCLRWLJ23] Shi Zhenmei, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha. "The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning." In International Conference on Learning Representations. 2023.
  • [BR88] Blum, Avrim L., and Ronald L. Rivest. "Training a 3-node neural network is NP-complete." Neural Networks 5, no. 1 (1992): 117-127. Conference version appeared in Advances in neural information processing systems 1 (1988).
  • [SWL22] Shi, Zhenmei, Junyi Wei, and Yingyu Liang. "A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features." In International Conference on Learning Representations. 2022.

Schedule:

Date Topic Lecture notes/Reading materials Assignments
Tuesday, Jan 24 Course Overview Slides HW1 released: LINK
Thursday, Jan 26 Challenges in Deep Learning Analysis Slides,
Lecture note; [ZBHRV17]
Tuesday, Feb 1 Approximation Power of Neural Networks I Lecture note; CH2 of [T21]
Thursday, Feb 2 Approximation Power of Neural Networks II Lecture note; CH2 of [T21]
Tuesday, Feb 7 Approximation Power of Neural Networks III Lecture note; CH3 of [T21]
Thursday, Feb 10 Implicit Regularization of Gradient Descent I Lecture note; CH9 of [A+21],[SHNGS18]
Tuesday, Feb 14 Implicit Regularization of Gradient Descent II Lecture note; CH9 of [A+21],[GLSS18,NLGSSS19] HW 2 released
Thursday, Feb 16 Implicit Regularization of Gradient Descent III Lecture note; CH9 of [A+21],[GLSS18,NLGSSS19]
Tuesday, Feb 21 Clarke Subdifferential and Positive Homogeneity Lecture note; CH9 of [T21]
Thursday, Feb 23 Implicit Regularization of Gradient Descent IV Lecture note; CH10 of [T21]
Tuesday, Feb 28 Neural Tangent Kernel I Lecture note; CH10 of [A+21], [DZPS19]
Thursday, Mar 2 Neural Tangent Kernel II Lecture note; CH10 of [A+21], [DZPS19]
Tuesday, Mar 7 Neural Tangent Kernel III Lecture note; CH8 of [T21], [CB18] HW 3 released
Thursday, Mar 9 Mean Field Analysis I Lecture note; [FDZ19], [MMN18]
Tuesday, Mar 21 Mean Field Analysis II Lecture note; [FDZ19], [MMN18]
Thursday, Mar 23 Representation Learning I Slides; Lecture note; [GL20]
Tuesday, Mar 28 Representation Learning II Lecture note; [AKKPS19] HW4 released
Thursday, Mar 30 Representation Learning III Slides;Lecture note; [SCLRXLJ23]
Tuesday, Apr 4 Complexity I Lecture note; [BR88]
Thursday, Apr 6 Complexity II Lecture note; [SWL22] HW 5 released
Apr 11 and afterwards Paper Presentation