```CS/ECE 861 Theoretical Foundations of Machine Learning

Description
Advanced mathematical theory and methods of machine learning. Statistical learning theory,
Vapnik-Chevronenkis Theory, model selection, high-dimensional models, nonparametric methods,

Prereq
CS/ECE 761 or ECE 830
(While not required, CS/ECE 532 Matrix Methods in Machine Learning, Math 521 Analysis I,
ECE 730 Modern Probability Theory and Stochastc Processes, CS/ECE 524 Introduction to
Optimization, and similar math courses are helpful)

Instructor
Professor Jerry Zhu, jerryzhu@cs.wisc.edu

Homeworks
Assignments are posted in the Canvas system.

Course discussion forum
Piazza discussions

Time and location
Lectures in CS 1325, MWF 9:30-10:45am, see calendar below
Office hour Wednesdays 2-3pm CS 6391

Exam
Midterm exam: Wed Feb 27 9:30-10:45am in-class
Final exam: Around mid-April, in-class

All exams are closed book.  Bring copious amount of blank scratch paper.
One 8.5x11 sheet of paper with notes on both sides allowed (handwritten or typed).  Lectures
and readings on the syllabus page are required.  You are responsible for topics covered in
lecture.  You should have knowledge sufficient to work through simple examples.
Exam grading questions must be raised with the instructor within one week after it is returned.

Project
An open machine learning project, done individually or in groups of two.  Requires an analysis component.
Proposal due Apr 5 before class.
Report due end-of-day May 3, 4-8 pages.

Syllabus
(tentative, subject to change)

Statistical Learning
Empirical risk minimization, PAC learning [SS 2, 3, 4]
concentration inequalities [SS B.1--B.5, V 2]
Structural risk minimization and minimum description length [SS 7]
Vapnik-Chevronenkis dimension [SS 6]

Online learning
batch perceptron [SS 9]
Halving and Littlestone dimension [SS 21.1.1]
Online convex optimization [OCO up 2.7, 3.1]

Bandit algorithms [BC 1, 2, 3; OCO 4.1, 4.2]
Active learning [H 1, 2, 5.1]
Machine Teaching [GK]
Submodularity [KD 1, 2]

References
[BC] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. In Foundations and Trends in Machine Learning, Vol 5: No 1, 1-122, 2012.

[GK] SA Goldman, MJ Kearns.  On the Complexity of Teaching. 1995

[H] Steve Hanneke.  Theory of Active Learning

[KD] Andreas Krause, Daniel Golovin.  Submodular Function Maximization

[OCO] "Online Learning and Online Convex Optimization". Shai Shalev-Shwartz. Foundations and Trends in Machine Learning, Volume 4, Issue 2

[SS] Understanding Machine Learning: From Theory to Algorithms
Shai Shalev-Shwartz and Shai Ben-David , Cambridge University Press 2014

[V] High-Dimensional Probability: An Introduction with Applications in Data Science
Roman Vershynin

Grading: Homeworks (30%), exam (40%), project (30%).

Class learning outcome
Student will be able to:
- derive sample complexity bounds using concentration of measure inequalities
- analyze bias-variance tradeoffs and model selection criteria
- derive rates of convergence for nonparametric machine learning algorithms
- gain familiarity with various machine learning paradigms, including supervised, unsupervised, active, multitask, and online learning.

```