CS839 Theoretical Foundations of Deep Learning

Spring 2022
Department of Computer Sciences
University of Wisconsin–Madison

NOTE: The content here is tentative, to provide you an idea about the course. It will be updated constantly until the start of the semester.

Course Overview

Deep learning has been the main driving force behind many modern intelligent systems and has achieved great success in many applications such as image processing, speech recognition, and game playing. However, the fundamental questions about why deep learning is so successful remain largely open. The goal of this course is to study and build the theoretical foundations of deep learning. Topics covered by this course include but are not limited to: approximation power of neural networks, optimization for deep learning, generalization analysis of deep learning. The instructor will give lectures on the selected topics. Students will present and discuss papers on the reading list, and do a course project.

The course will consist of mostly reading and discussing recent important papers on the theoretical analysis of deep learning, some homework assignments, and a course project.


(CS760 or CS761 or CS861) AND (strong math background in machine learning, statistics and optimization)

The course is intended to be advanced study and will not provide review for setting up the background. You're expected to be comfortable with the mathematical analysis: please check the reference material below. In particular, CS760 background is helpful but is not sufficient; additional math background to CS760 is needed.


Time:Tuesday and Thursday 11:00am - 12:15pm

Location: CS Building Room 1325

Reference Materials

Textbook: There are no required textbooks. The following are recommended reference books for theoretical foundations of machine learning:

  • Understanding machine learning: From theory to algorithms. Shai Shalev-Shwartz, and Shai Ben-David. Cambridge University Press, 2014.
  • Foundations of machine learning. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. MIT Press, 2018.

Papers (some examples; more to be added):

  • Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. "Understanding deep learning requires rethinking generalization." In International Conference on Learning Representations. 2017.
  • Soudry, Daniel, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. "The implicit bias of gradient descent on separable data." The Journal of Machine Learning Research 19, no. 1 (2018): 2822-2878.
  • Chizat, Lénaïc, and Francis Bach. "On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport." Advances in Neural Information Processing Systems 31 (2018): 3036-3046.
  • Mei, Song, Andrea Montanari, and Phan-Minh Nguyen. "A mean field view of the landscape of two-layer neural networks." Proceedings of the National Academy of Sciences 115, no. 33 (2018): E7665-E7671.
  • Du, Simon S., Xiyu Zhai, Barnabas Poczos, and Aarti Singh. "Gradient Descent Provably Optimizes Over-parameterized Neural Networks." In International Conference on Learning Representations. 2018.
  • Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: convergence and generalization in neural networks." Advances in Neural Information Processing Systems. 2018.
  • Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. "Reconciling modern machine-learning practice and the classical bias–variance trade-off." Proceedings of the National Academy of Sciences 116, no. 32 (2019): 15849-15854.


The following weighted sum are used for the final average score:

  • Lecture note scribe: 10%
  • Homework: 40%
  • Paper presentation: 10%
  • Project: 40%


There will be roughly 5 homework assignments. Homework is required to be written in Latex. Unless indicated otherwise, you can discuss with the other students but must finish the homework by yourself. If you discuss with others, please indicate that in your submission; if you consult external materials like Internet post, please cite the references.


Students are required to do a project in this class, since the goal of the course is to provide the opportunity to explore the frontier in recent theoretical studies of deep learning. Projects should be proposed by the proposal deadline (this is expected to be around the midterm and will be specified in class). A pdf report (written in Latex) should be submitted by the project deadline. The report should be in the style of a conference paper (e.g., using the style files of NeurIPS), providing an introduction/motivation, discussion of related work, a description of your work that is detailed enough that the work could be replicated, and a conclusion.

The topic of the project include but not limited to:

  • Extension of existing work: improved bounds, more thorough analysis, adaptation to new problem settings, etc
  • Novel theoretical analysis of existing deep learning methods/problems
  • Novel formulation of existing deep learning problems and corresponding analysis

The ideal outcome is a report publishable in major machine learning conferences or journals. Published work of the students cannot be used as the course project.

Academic Integrity