CS839 Theoretical Foundations of Deep Learning

CS839, Spring 2022
Department of Computer Sciences
University of Wisconsin–Madison


Reference Materials

Books/Courses: no textbooks are required. Please get familiar with the basic analysis tools in the books listed in the Prerequisite. The following books are additional ones, part of which may be used for or complementary to the lectures.

Some course materials from the following course will be used: We thank the authors for providing these excellent materials!

Papers:

  • [ZBHRV17] Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. "Understanding deep learning requires rethinking generalization." In International Conference on Learning Representations. 2017.
  • [SHNGS18] Soudry, Daniel, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. "The implicit bias of gradient descent on separable data." The Journal of Machine Learning Research 19, no. 1 (2018): 2822-2878.
  • [GLSS18] Gunasekar, Suriya, Jason Lee, Daniel Soudry, and Nathan Srebro. "Characterizing implicit bias in terms of optimization geometry." In International Conference on Machine Learning, pp. 1832-1841. PMLR, 2018.
  • [NLGSSS19] Nacson, Mor Shpigel, Jason Lee, Suriya Gunasekar, Pedro Henrique Pamplona Savarese, Nathan Srebro, and Daniel Soudry. "Convergence of gradient descent on separable data." In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3420-3428. PMLR, 2019.
  • [DZPS18] Du, Simon S., Xiyu Zhai, Barnabas Poczos, and Aarti Singh. "Gradient Descent Provably Optimizes Over-parameterized Neural Networks." In International Conference on Learning Representations. 2018.
  • [CB18] Chizat, Lénaïc, and Francis Bach. "On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport." Advances in Neural Information Processing Systems 31 (2018): 3036-3046.
  • [JGH18] Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: convergence and generalization in neural networks." Advances in Neural Information Processing Systems. 2018.
  • [FDZ19] Fang, Cong, Hanze Dong, and Tong Zhang. "Over parameterized two-level neural networks can learn near optimal feature representations." arXiv preprint arXiv:1910.11508 (2019).
  • [MMN18] Mei, Song, Andrea Montanari, and Phan-Minh Nguyen. "A mean field view of the landscape of two-layer neural networks." Proceedings of the National Academy of Sciences 115, no. 33 (2018): E7665-E7671.
  • [GL20] Garg, Siddhant, and Yingyu Liang. "Functional regularization for representation learning: A unified theoretical perspective." Advances in Neural Information Processing Systems 33 (2020): 17187-17199.
  • [AKKPS19] Arora, Sanjeev, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. "A theoretical analysis of contrastive unsupervised representation learning." In 36th International Conference on Machine Learning, ICML 2019, pp. 9904-9923. International Machine Learning Society (IMLS), 2019.
  • [BR88] Blum, Avrim L., and Ronald L. Rivest. "Training a 3-node neural network is NP-complete." Neural Networks 5, no. 1 (1992): 117-127. Conference version appeared in Advances in neural information processing systems 1 (1988).
  • [SWL22] Shi, Zhenmei, Junyi Wei, and Yingyu Liang. "A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features." In International Conference on Learning Representations. 2022.

Schedule

Date Topic Lecture notes/Reading materials Assignments
Tuesday, Jan 25 Course Overview Slides
Thursday, Jan 27 Challenges in Deep Learning Analysis Slides,
Lecture note; [ZBHRV17]
HW1 released
Tuesday, Feb 1 Approximation Power of Neural Networks I Lecture note; CH2 of [T21]
Thursday, Feb 3 Approximation Power of Neural Networks II Lecture note; CH2 of [T21]
Tuesday, Feb 8 Approximation Power of Neural Networks III Lecture note; CH3 of [T21]
Thursday, Feb 10 Implicit Regularization of Gradient Descent I Lecture note; CH9 of [A+21],[SHNGS18]
Tuesday, Feb 15 Implicit Regularization of Gradient Descent II Lecture note; CH9 of [A+21],[GLSS18,NLGSSS19] HW 2 released
Thursday, Feb 17 Implicit Regularization of Gradient Descent III Lecture note; CH9 of [A+21],[GLSS18,NLGSSS19]
Tuesday, Feb 22 Clarke Subdifferential and Positive Homogeneity Lecture note; CH9 of [T21]
Thursday, Feb 24 Implicit Regularization of Gradient Descent IV Lecture note; CH10 of [T21]
Tuesday, Mar 1 Neural Tangent Kernel I Lecture note; CH10 of [A+21], [DZPS19]
Thursday, Mar 3 Neural Tangent Kernel II Lecture note; CH10 of [A+21], [DZPS19]
Tuesday, Mar 8 Neural Tangent Kernel III Lecture note; CH8 of [T21], [CB18] HW 3 released
Thursday, Mar 10 Mean Field Analysis I Lecture note; [FDZ19], [MMN18]
Tuesday, Mar 22 Mean Field Analysis II Lecture note; [FDZ19], [MMN18]
Thursday, Mar 24 Mean Field Analysis III Lecture note; [FDZ19], [MMN18]
Tuesday, Mar 29 Representation Learning I Slides; Lecture note; [GL20]
Thursday, Mar 31 Representation Learning II Lecture note; [AKKPS19]
Tuesday, Apr 5 Complexity I Lecture note; [BR88] HW 4 released
Thursday, Apr 7 Complexity II Lecture note; [SWL22]
Apr 12 - May 3 Paper Presentation HW 5 released on Apr 16