CS/ECE/STAT-861: Theoretical Foundations of Machine Learning

University of Wisconsin-Madison, Fall 2024

Overview

This class will cover fundamental and advanced theoretical topics in Machine Learning. We will focus on several paradigms of learning (such as supervised/unsupervised learning, online learning, and sequential decision-making) and examine questions such as: Under what conditions can we learn and generalize from a limited amount of data? How hard is a given learning problem? How good is a learning algorithm and is it optimal for the given problem? When making decisions under uncertainty, how do we trade-off between learning about the environment and achieving our goal? We will use tools from several areas related to machine learning, such as statistics, algorithms, information theory, and game theory.

This course will be primarily targeted towards PhD students who intend to do research in theoretical machine learning and statistics.

Quick links:   Canvas,   Piazza,   Fall ’23 course.

Course staff

Instructor: Kirthevasan Kandasamy.
Office hours: Wednesdays 1:00 PM – 2:30 PM at CS5375.
E-mail: kandasamy@cs{dot}wisc{dot}edu.

Grader: Albert Dorador-Chalar.
E-mail: albert.dorador@wisc{dot}edu.

Lectures

Monday, Wednesday, and Friday. 09:30 AM – 10:45 AM. ENGR HALL 2540.
There will be a total of 27–30 lectures. Lecture notes scribed by students will be made available within 4-5 days of the lecture.

Topics

This is a tentative list of topics that we intend to cover in this class. The course staff reserves the right to modify the syllabus as they see fit.

  • PAC Learning

    • Loss, risk, empirical risk minimization

    • Agnostic PAC Learning

    • Rademacher complexity and VC dimension

    • Sauer's Lemma

  • Statistical lower bounds

    • Average (Bayes’) risk optimality vs minimax optimality

    • Lower bounds for point estimation

    • Review of information theory, distances between distributions

    • Going from estimation to testing: Fano and LeCam methods

    • Constructing tight packings, Gilbert-Varshamov lemma

    • Applications: nonparametric regression and density estimation, classification in a VC class

  • Nonparametric methods

    • Nonparametric regression, Nadaraya-Watson estimator

    • Kernely density estimation

  • Stochastic bandits

    • Optimism in the face of uncertainty and the Upper Confidence Bound (UCB) algorithm

    • Lower bounds for stochastic K-armed bandits

    • Linear bandits, martingale concentration

  • Online learning and adversarial bandits

    • Learning from experts and the Hedge algorithm

    • Adversarial bandits and EXP3

    • Lower bounds for adversarial bandits and learning from experts

    • Contextual bandits and EXP4

    • Learning in games

    • Regret minimization in non-stationary environments

  • Online convex optimization

    • Follow the leader, Follow the regularized leader

    • FTRL with convex regularizers, Online gradient descent

    • Follow the perturbed leader, online shortest paths

Prerequisites

CS761 or equivalent. I may waive this requirement, but it is the student's responsibility to have an adequate background in probability, statistics, calculus, and algorithms. I will not be doing a review of these topics at the beginning of the class.

I will release a set of diagnostic questions as Homework 0 at the beginning of class. While you are not expected to know the solutions right away, you should be able to solve most of the questions with reasonable effort after looking up any references if necessary.

Recommended textbooks

We will not be following a textbook in this class. However, the following texts are excellent references.

Logistics

Canvas: We will use canvas for homeworks and exams.

Piazza: Please sign up for the class on piazza via this link.

  • Piazza will be used for most announcements. But please check Canvas for announcements as well.

  • If you have any questions about class, it is best to message me via Piazza instead of directly emailing me. Please post your question publicly if you feel that other students may be able to answer it, or if you think that other students may benefit from the answer.

  • You may use Piazza for peer discussions about lectures or clarifications about homework questions. While I will be checking Piazza regularly, as a general rule, I will not be answering questions about homework in Piazza. It is best to use my OHs to discuss homework questions.

Grading

Your grade will be determined by scribing, homeworks, a take-home exam, and a course project. See the grading page for more details.