CS839 Theoretical Foundations of Deep Learning

Spring 2023
Department of Computer Sciences
University of Wisconsin–Madison

Course Overview

Deep learning has been the main driving force behind many modern intelligent systems and has achieved great success in many applications such as image processing, speech recognition, and game playing. However, the fundamental questions about why deep learning is so successful remain largely open. The goal of this course is to study and build the theoretical foundations of deep learning. Topics covered by this course include but are not limited to: approximation power of neural networks, optimization for deep learning, generalization analysis of deep learning. The instructor will give lectures on the selected topics. Students will present and discuss papers on the reading list, and do a course project.

The course will consist of mostly reading and discussing recent important papers on the theoretical analysis of deep learning, some homework assignments, and a course project.




(CS760 or CS761 or CS861) AND (strong math background in machine learning, statistics and optimization)

The course is intended to be advanced study and will not provide review for setting up the background. In particular, CS760 background is helpful but is not sufficient; additional math background to CS760 is needed. You're expected to be familiar with the analysis tools in the following textbooks (or at similar levels):

  • Understanding machine learning: From theory to algorithms. Shai Shalev-Shwartz, and Shai Ben-David. Cambridge University Press, 2014. [Link]
  • High-Dimensional Probability: An Introduction with Applications in Data Science. Roman Vershynin. Cambridge University Press, 2018. [Link]
  • Introductory Lectures on Convex Optimization: A Basic Course. Yurii Nesterov. Springer, 2004. [Link]

You're expected to be comfortable with performing mathematical analysis: to get a sense, please check the reference material on the Schedule page.


Time: Tuesday and Thursday 11:00am - 12:15pm

Location: Engineering Hall 2309

Office hours: Th 2-3pm, CS Building Room 5387


The following weighted sum are used for the final average score:

  • Homework: 50%
  • Paper presentation: 10%
  • Project: 40%


There will be roughly 5 homework assignments. Homework is required to be written in Latex. Unless indicated otherwise, you can discuss with the other students but must finish the homework by yourself. If you discuss with others, please indicate that in your submission; if you consult external materials like Internet post, please cite the references.


Students are required to do a project in this class, since the goal of the course is to provide the opportunity to explore the frontier in recent theoretical studies of deep learning. A project guideline will be provided to specify the details. Roughly speaking, projects should be proposed by the proposal deadline (this is expected to be around the midterm and will be specified in class). A pdf report (written in Latex) should be submitted by the project deadline. The report should be in the style of a conference paper, providing an introduction/motivation, discussion of related work, a description of your work that is detailed enough that the work could be replicated, and a conclusion.

The topic of the project include but not limited to:

  • Extension of existing work: improved bounds, more thorough analysis, adaptation to new problem settings, etc
  • Novel theoretical analysis of existing deep learning methods/problems
  • Novel formulation of existing deep learning problems and corresponding analysis
  • Interesting empirical observations and proposing theoretical explanations (preferably in the form of math analysis)

The ideal outcome is a report publishable in major machine learning or theory conferences or journals. Published work of the students cannot be used as the course project.

Academic Integrity