CS761 Mathematical Foundations of Machine Learning
Spring 2017


Mathematical foundations of machine learning theory and algorithms. Probabilistic, algebraic, and geometric models and representations of data, mathematical analysis of state-of-the-art learning algorithms and optimization methods, and applications of machine learning. Students should have taken a course in statistics and a course in linear algebra (e.g., STAT 302 and MATH 341). Professor Jerry Zhu, jerryzhu@cs.wisc.edu Office hour Wednesdays 2-3pm, CS6391 TA Xuezhou Zhang, xzhang784@wisc.edu, office hour Tuesdays 12-1pm, CS6397. Lectures in ENGR HALL 2305, see calendar below Exam Friday Apr 14 9:30-10:45am in classroom. Closed book. Homeworks Please submit hw1 pdf via UW-Madison's Canvas system. hw1 (latex) solution hw2 (latex) solution hw3 (latex) Project Open project. Groups of size 1 or 2. Goal should be to make a small contribution to machine learning research itself. Ideas: browse recent NIPS, ICML, AISTATS, COLT conferences; follow at least 5 recent papers in the thread. Output: a 6-8 page NIPS paper style project report (abstract, body, references), download latex style file here. Deadline May 4, 2017, submit via Canvas. Topics (tentative) Review of Probability and Statistics (notes 1, 2, Wasserman 1-8, Moon & Stirling 10, 11, p value) - Probability spaces, basic measure theory, expectations - Central Limit Theorem, Strong Laws (CLT_beta.m) - Hypothesis testing and basic parameter estimation Linear Algebraic Methods in Machine Learning (notes, RKHS, Moon & Stirling 2-7, Wasserman 21) - Vector spaces - Subspaces and projections - Multivariate Gaussian - PCA (demo_eig.m, demo_svd.m) Generative models for ML (estimation, decision, exponential family, graphical model tutorial, notes, notes 2, kernel density, Wasserman 9-20, Moon & Stirling 12, NIPS99, Shalev-Shwartz & Ben-David 24) - Maximum likelihood (fisherinfo.m) - Bayesian methods in ML - Graphical models - Exponential family and conjugate duality - Kernel density estimation - Gaussian processes (notes, book) Discriminative Learning (svm, Boyd and Vandenberghe Ch 2,3,4, Wasserman 22) - Loss functions (surrogate loss) - Support vector machines - Regularization and kernel methods - Convex optimization in ML - Reproducing Kernel Hilbert Spaces Stochastic Simulation and Optimization Methods for Machine Learning - Monte Carlo methods (notes, Wasserman 23, 24) - Stochastic gradient and proximal gradient algorithms (Shalev-Shwartz & Ben-David 14) - Expectation-Maximization (notes) Mathematical Analysis and Learning of Data Representations - PCA, ICA - Dictionary learning - Deep learning Theoretical Analysis of Machine Learning Algorithms (notes, Shalev-Shwartz & Ben-David 1-7, Bousquet, Boucheron, Lugosi, subgaussian) - Probably approximately correct learning - Concentration inequalities and sample complexity analysis The book ladder (read from the bottom up) Understanding Machine Learning: From Theory to Algorithms. By Shai Shalev-Shwartz and Shai Ben-David. A Probabilistic Theory of Pattern Recognition. By Luc Devroye, Laszlo Gyorfi and Gabor Lugosi. High-Dimensional Probability for Mathematicians and Data Scientists. By Roman Vershynin Foundations of Data Science. By Avrim Blum, John Hopcroft and Ravindran Kannan A Mathematical Introduction to Data Science. By Yuan Yao The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. By Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Pattern Recognition and Machine Learning. By Christopher M. Bishop All of Statistics: A Concise Course in Statistical Inference. By Larry Wasserman. Mathematical Methods and Algorithms for Signal Processing. By Todd K. Moon, and Wynn C. Stirling. Convex Optimization. By Stephen Boyd and Lieven Vandenberghe. Linear Algebra, 4th Edition. By Stephen H. Friedberg, Arnold J. Insel, Lawrence E. Spence. Probability and Statistics for Engineering and the Sciences. By Jay L. Devore. Grading: Homeworks (30%), exam (40%), project (30%).