CS 760: Machine Learning (Spring 2017)

Instructor: David Page
page@biostat.wisc.edu
(Please put cs760 in email subject line; otherwise it's easy to overlook emails)
Office Hours: 1pm-2pm Tuesdays and Thursdays in 1153/1154 WID

TAs:
Kirthanaa Raghuraman
kraghuraman (at) wisc (dot) edu
Office Hour: 11 - 12 PM Tuesdays and 1:30 - 2:30 PM Wednesdays in CS 6382

Heemanshu Suri
hsuri (at) wisc (dot) edu
Office Hour: 3:30 - 4:30 PM Tuesdays and Thursdays in CS 5384

Important Dates:
- Exam: Wed, April 12, regular room and time (11am-12:30pm). You are allowed a pen/pencil, calculator, and one page of notes (normal 8.5 by 11 in., front and back). Exam will cover material up to and including reinforcement learning.
- Project Due Date (pdf by email to professor, 1 per group): May 7

Prerequisite: CS 540 or equivalent
Meeting Time and Location: 11am MWF, 132 Noland
Textbook:
- Tom Mitchell (1997). Machine Learning. McGraw-Hill.
- The following textbook is freely available for download and can be tested as alternative if you like: Shalev-Shwartz and Ben-David (2014). Let me know after the semester how it worked for you.

Course Overview

Many of the same technologies underly adaptive autonomous robots, scientific knowledge discovery, adaptive game playing and discovery from databases. This course will focus on these key underlying technologies, particularly supervised learning. The course will cover support vector machines, decision tree learners, neural network learning and Bayesian classifiers, among others. It also will address reinforcement learning and learning from relational data, including statistical relational learning and inductive logic programming. It will cover correct evaluation methodology, including case studies of methodological errors.

Course Outline

Course Overview, Feature Vector Representation, Unsupervised Learning Overview (Mitchell Ch. 1)
Brief Introduction to Probability (Mitchell Ch. 6, supplementary background notes on probability and Bayesian Networks: 1, 2, 3)
Decision Trees (Mitchell Ch. 3, Skewing)
Instance-Based Learning, k-Nearest Neighbor (Mitchell Ch. 8.1 and 8.2)
Bayesian Network Learning including Naive Bayes and TAN (Heckerman Tutorial; Recommended: Friedman, Geiger & Goldszmidt, Machine Learning Journal 1997; Friedman, Nachman & Peer, UAI-99; Mitchell Ch. 6; additional lecture notes on Gibbs Sampling and MCMC theory [PDF])
Machine Learning Methodology (Mitchell Ch. 5; Optional Supplements: The Case Against Accuracy Estimation for Comparing Induction Algorithms by F. Provost, T. Fawcett, and R. Kohavi, Proc. ICML-98; The Relationship Between Precision-Recall and ROC Curves by J. Davis and M. Goadrich, Proc. ICML-06)
Neural Networks and Deep Learning (Yujia Bao's Guest Lecture on Deep Learning, Mitchell Ch. 4, Andrew Ng's Deep Learning Tutorial)
Generative Adversarial Networks
Computational Learning Theory [PDF](Mitchell Ch. 7)
Regression (Linear and Logistic, including LASSO-penalized forms)
Support Vector Machines (Ben-Hur and Weston, 2010, Alternative SVM Lecture by Gautam Kunapuli (optional), Chris Burges's tutorial (optional))
SVM by Sequential Minimal Optimization (SMO)[PDF] (Platt's original SMO paper)
Ensemble Methods [PDF] (Dietterich, 2002)
Temporal Models (includes dynamic Bayesian networks, continuous-time Bayesian networks, piecewise-constant conditional intensity models, Hawkes processes)
Reinforcement Learning (Mitchell Ch. 13)
Review for Exam on April 10
Exam on April 12
Rule Learning and Relational Learning (Mitchell Ch. 10)
Markov Networks. Try this tutorial on log-linear models by Frank Ferraro and Jason Eisner.
Statistical Relational Learning
The lectures below for ILP and SRL will not be used in class, but are left here for background.
Background for Rule Learning and Inductive Logic Programming (Mitchell Ch. 10; for added background see De Raedt & Muggleton)
Rule Learning and Inductive Logic Programming (Mitchell Ch. 10; for added background see De Raedt & Muggleton)
- Logical Foundations
- ILP as Refinement Graph Search (Lavrac & Dzeroski, Sections 3.2, 4.1, Chapter 7)
Statistical Relational Learning
- Markov Networks and Markov Logic Networks: see also Lecture by Pedro Domingos on Statistical Modeling of Relational Data (Domingos & Richardson)
- Plate Models and Probabilistic Relational Models (Heckerman, Meek & Koller)
- View Learning in SRL: With an Application to Mammography (Davis et al., 2007)
Dimensionality-Reduction
Remaining Topics: Active Learning, Causal Discovery, Multiple-Instance Learning (if time permits)

Course Requirements

The grading for the course will be be based on:

Homework Assignments (5 anticipated): 40%
Exam: 35%
Project: 25%

Homework Policy

The programming assignments are to be done individually. You may communicate with other class members about the problem, but please do not seek or receive help from people not in the class, and please do not share answers or code. Your programs may be in C, C++, Java, Perl, Python, or R. You must submit both linux executable and source code; your program should run on the CS Dept. lab computers. Please test them there; they will be graded based on how they run there, not elsewhere! Assignments are to be submitted at the course Canvas site.

Homework assignments are due at the start of class on the assigned due date, and late homeworks will be penalized 10 points (out of 100) for each day that passes after the assigned due date. Homeworks cannot be submitted more than one week late; the submission site will be locked at that time. At the start of the course every student will be given 5 "free" days, each of which may be used to offset a 10-point late penalty. Only 2 free days can be used for any given written assignment, so that solutions can be posted at next class period. Free days are non-transferable, and no credit will be given for unused free days. Nevertheless, please use them sparingly because the late penalty is strictly enforced. Please do not ask for special consideration for travel, exams in other classes, and other extenuating circumstances; this is what the free days are there for.

Homework Assignments

Assignment 1. Assigned 1/18, Due 2/1.

Assignment 2 Assigned 2/5, Due 2/19

Assignment 3 Assigned 2/27, Due 3/13

Assignment 4 Assigned 3/18, Due 04/02

Project

Projects must be done in groups of 4-5 people and will be due (pdf report and submission of any code written) to me by email, by 11:59pm on May 7. Late days cannot be used for the project because I need time to grade them all by the end of exam week, in order to compute final grades on time. Projects should be proposed by March 15 (verbal or email communication is acceptable). The basis for the project grade will be your written report. The report should be in the style of a conference paper, providing an introduction/motivation, discussion of related work, a description of your work that is detailed enough that the work could be replicated, and a conclusion. The format of the description of your work will depend on the nature of your project. If it is an implementation, then the description should make clear the algorithm(s) implemented and provide experimental results. If it is an application project, the description should say which system was used, how the data (or any other materials used) were collected, what experimental methodology was employed, and some estimate of the quality of the experimental results (e.g. a 10-fold cross-validation accuracy estimate). If it is a theoretical project, then the project description should consist of detailed definitions, theorems, and proofs.

Sample Exams

Additional Sample Exercises

Exercises with Markov networks, Markov logic, Ordinary least squares regression, and FOIL