CS/ECE 861 Theoretical Foundations of Machine Learning

DescriptionAdvanced mathematical theory and methods of machine learning. Statistical learning theory, Vapnik-Chevronenkis Theory, model selection, high-dimensional models, nonparametric methods, probabilistic analysis, optimization, learning paradigms.PrereqCS/ECE 761 or ECE 830 (While not required, CS/ECE 532 Matrix Methods in Machine Learning, Math 521 Analysis I, ECE 730 Modern Probability Theory and Stochastc Processes, CS/ECE 524 Introduction to Optimization, and similar math courses are helpful)InstructorProfessor Jerry Zhu, jerryzhu@cs.wisc.eduHomeworksAssignments are posted in the Canvas system.Course discussion forumPiazza discussionsTime and locationLectures in CS 1325, MWF 9:30-10:45am, see calendar below Office hour Wednesdays 2-3pm CS 6391ExamMidterm exam: Wed Feb 27 9:30-10:45am in-class Final exam: Around mid-April, in-class All exams are closed book. Bring copious amount of blank scratch paper. One 8.5x11 sheet of paper with notes on both sides allowed (handwritten or typed). Lectures and readings on the syllabus page are required. You are responsible for topics covered in lecture. You should have knowledge sufficient to work through simple examples. Exam grading questions must be raised with the instructor within one week after it is returned.ProjectAn open machine learning project, done individually or in groups of two. Requires an analysis component. Proposal due Apr 5 before class. Report due end-of-day May 3, 4-8 pages.Syllabus(tentative, subject to change) Statistical Learning Empirical risk minimization, PAC learning [SS 2, 3, 4] concentration inequalities [SS B.1--B.5, V 2] Structural risk minimization and minimum description length [SS 7] Bias-complexity tradeoffs [SS 5] Vapnik-Chevronenkis dimension [SS 6] Rademacher complexities [SS 26, ZRG'09] Online learning batch perceptron [SS 9] Halving and Littlestone dimension [SS 21.1.1] Online convex optimization [OCO up 2.7, 3.1] Advanced Learning Paradigms Bandit algorithms [BC 1, 2, 3; OCO 4.1, 4.2] Active learning [H 1, 2, 5.1] Machine Teaching [GK] Submodularity [KD 1, 2]References[BC] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. In Foundations and Trends in Machine Learning, Vol 5: No 1, 1-122, 2012. [GK] SA Goldman, MJ Kearns. On the Complexity of Teaching. 1995 [H] Steve Hanneke. Theory of Active Learning [KD] Andreas Krause, Daniel Golovin. Submodular Function Maximization [OCO] "Online Learning and Online Convex Optimization". Shai Shalev-Shwartz. Foundations and Trends in Machine Learning, Volume 4, Issue 2 [SS] Understanding Machine Learning: From Theory to Algorithms Shai Shalev-Shwartz and Shai Ben-David , Cambridge University Press 2014 [V] High-Dimensional Probability: An Introduction with Applications in Data Science Roman VershyninGrading:Homeworks (30%), exam (40%), project (30%).Class learning outcomeStudent will be able to: - derive sample complexity bounds using concentration of measure inequalities - analyze bias-variance tradeoffs and model selection criteria - derive rates of convergence for nonparametric machine learning algorithms - gain familiarity with various machine learning paradigms, including supervised, unsupervised, active, multitask, and online learning.