CS/ECE 861 Theoretical Foundations of Machine Learning

Description Advanced mathematical theory and methods of machine learning. Statistical learning theory, Vapnik-Chevronenkis Theory, model selection, high-dimensional models, nonparametric methods, probabilistic analysis, optimization, learning paradigms. Prereq CS/ECE 761 or ECE 830 Instructor Professor Jerry Zhu, jerryzhu@cs.wisc.edu Homeworks Assignments are posted in the Canvas system. Course discussion forum Piazza discussions Time and location Lectures in ENGR HALL 2534 Office hour Wednesdays 2-3pm CS 6391 Exam Midterm exam: Wed Feb 28 9:30-10:45am ENGR HALL 2534 Final exam: Monday April 23 9:30-10:45am ENGR HALL 2534 All exams are closed book. Bring copious amount of blank scratch paper. One 8.5x11 sheet of paper with notes on both sides allowed (handwritten or typed). Lectures and readings on the syllabus page are required. You are responsible for topics covered in lecture. You should have knowledge sufficient to work through simple examples. Exam grading questions must be raised with the instructor within one week after it is returned. Project An open machine learning project, done individually or in groups of two. Requires an analysis component. Proposal due Apr 4 before class. Report due May 4. NIPS format 4-8 pages. Syllabus Statistical Learning Empirical risk minimization, PAC learning [SS 2, 3, 4] concentration inequalities [SS B.1--B.5, V 2] Structural risk minimization and minimum description length [SS 7] Bias-complexity tradeoffs [SS 5] Vapnik-Chevronenkis dimension [SS 6] Rademacher complexities [SS 26, ZRG'09] Online learning batch perceptron [SS 9] Halving and Littlestone dimension [SS 21.1.1] Online convex optimization [OCO up 2.7, 3.1] Advanced Learning Paradigms Bandit algorithms [BC 1, 2, 3; OCO 4.1, 4.2] Active learning [H 1, 2, 5.1] Machine Teaching [GK] Submodularity [KD 1, 2] References [BC] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. In Foundations and Trends in Machine Learning, Vol 5: No 1, 1-122, 2012. [GK] SA Goldman, MJ Kearns. On the Complexity of Teaching. 1995 [H] Steve Hanneke. Theory of Active Learning [KD] Andreas Krause, Daniel Golovin. Submodular Function Maximization [OCO] "Online Learning and Online Convex Optimization". Shai Shalev-Shwartz. Foundations and Trends in Machine Learning, Volume 4, Issue 2 [SS] Understanding Machine Learning: From Theory to Algorithms Shai Shalev-Shwartz and Shai Ben-David , Cambridge University Press 2014 [V] High-Dimensional Probability: An Introduction with Applications in Data Science Roman Vershynin Grading: Homeworks (30%), exam (40%), project (30%). Class learning outcome Student will be able to: - derive sample complexity bounds using concentration of measure inequalities - analyze bias-variance tradeoffs and model selection criteria - derive rates of convergence for nonparametric machine learning algorithms - gain familiarity with various machine learning paradigms, including supervised, unsupervised, active, multitask, and online learning.