CS 731: Advanced Artificial Intelligence ("Machine Learning 2")
Spring 2011
Homeworks: Follow this link for assignments and policies
Exam solution
Topics:
- Probability Background: σ-algebra, probability measures, types of convergence of random variables
lecture notes; Reading: AoS ch 1, 2, 3, 5
- Statistical Machine Learning: parametric vs. non-parametric, Fisher information, Cramer-Rao bound, frequentist vs. Bayesian
lecture notes; Reading: AoS ch 6, 9, 11
- Statistical decision theory: loss, risk, Bayes, minimax
lecture notes; Reading: AoS ch 12
- Sparsity in regression: ridge, LASSO, large p small n problems
lecture notes; Reading: ESL ch 3, Buhlmann J. Royal Stat. Society 2011 and NIPS 2010 tutorial
- Graphical Models: directed, undirected, factor graphs, Latent Dirichlet Allocation, sum-product, max-sum, mean field
lecture notes; Reading: PRML ch 8
- Exponential Families: maximum entropy, mean parameters, marginal polytopes, conjugate duality
lecture notes; Reading: Wainwright & Jordan FTML08
- Variational Methods: mean field, sum-product as variational approximation; variational EM
lecture notes; Reading: Wainwright & Jordan FTML08
- Markov Chain Monte Carlo: Rejection sampling, importance sampling, Metropolis-Hastings, Gibbs, slice sampling, coupling from the past
lecture notes; Reading: PRML ch 11, David MacKay Information Theory, Inference, and Learning Algorithms Ch 29, 32
- Nonparametric methods: Kernel density estimation, Nadaraya-Watson, local linear regression
lecture notes; Reading: AoS ch 20
- Dimensionality Reduction: PCA, multidimensional scaling, Isomap, locally linear embedding, Laplacian eigenmaps, spectral clustering
lecture notes; Reading: Burges FTML10
- Bayesian nonparametrics: Gaussian Processes, Dirichlet Process Mixture Models
lecture notes; Reading: Rasmussen & Williams Gaussian Processes for Machine Learning Ch 1,2,3; Teh Dirichlet Process
- Compressive sensing and Matrix Completion: Johnson–Lindenstrauss lemma, restricted isometry property
lecture notes; Reading: Candès & Wakin An Introduction to Compressive Sampling, Candès & Recht Exact Matrix Completion Via Convex Optimization
Schedule: 9:30--10:45 MWF. 103 Psychology
This class is "Machine Learning 2", the second installment of the machine learning sequence following CS760.
The goal is to further prepare you as a machine learning researcher. Here is an analogy: In CS760, we opened
up a toolbox for you, and you learned how to use hammers, screw drivers, wrenches, etc. (translation: SVMs,
naive Bayes, decision trees, etc.) for your home improvement projects. In this class, we will do three things:
1) Show you some more powerful tools. But if your interest is only in using machine learning tools, this class
is not for you; 2) Teach you the equivalent of mechanical engineering so you can invent new tools; 3) Teach you
the equivalent of physics so you understand why the tools work. We will cover both foundations and cutting-edge
topics in statistical machine learning. This will be a more theoretical, less practical class.
Prerequisites: Officially CS540. Taking this class after CS760 or equivalent is recommended but not strictly
required. Previous coursework in linear algebra, multivariate calculus, basic probability and statistics is
required. Familiarity with a matrix-oriented programming language (e.g., MATLAB, R, S-plus etc.), and math
maturity at this level is recommended. Homeworks to be typeset with Latex. If you have taken CS731 before but
would like to take this new version of the class for credit, please send me an email -- we will address it on a
case-by-case basis.
Instructor: Xiaojin (Jerry) Zhu
Office: 6391 CS
E-mail: jerryzhu@cs.wisc.edu
Phone: 608-890-0129
Office Hours: 3:45-4:45pm Tuesdays, or by appointment
Teaching Assistant: There will be none. Instead, let us do crowd sourcing and help ourselves. Please
use the class mailing list (see below) to ask any questions, from Latex, to Matlab, to specific class content.
Please also be a good citizen and help answer your fellow students' questions. The frequency and quality of
answers will be factored into consideration in your course grade.
Textbook: By taking this course, you are serious about machine learning. We will draw materials from
multiple sources and there will not be a single required textbook. Nonetheless, you should obtain these books.
They are excellent reference books for you down the road and worth more than gold of equal weight.
[1] (AoS) Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference. Springer, 2003.
[2] (PRML) Christopher M. Bishop, Pattern Recognition and Machine Learning. Springer Verlag, 2006.
[3] (ESL) Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Second Edition, 2009. (Available online)
Grading: Homeworks (30%), midterm exam (40%), and a project (30%).
Other:
Class mailing list: compsci731-1-s11@lists.wisc.edu (archive)
Course URL: http://pages.cs.wisc.edu/~jerryzhu/cs731.html