CS 760: Machine Learning (Fall 2007)

  • Instructor: David Page
    page@biostat.wisc.edu
    Office: 6743 Medical Sciences Center (corner of Charter and University)
    Office Hours: 3-4pm Tuesday, 1:30-2:30pm Thursday, or by appointment
    Office Phone: 265-6168

  • TA: Daniel Wong
    Email: dwong
    Office: 5364 Computer Sciences
    Office Hours: 1-2M, 2-3T, 11-12Th, 10-11F
    Office Phone: 262-5340

  • Prerequisite: CS 540 or equivalent

  • Meeting Time and Location: 2:30-3:45 MWF, 1325 Computer Sciences

  • Textbook:

    • Tom Mitchell (1997). Machine Learning. McGraw-Hill.

  • Archive of class e-mail

Course Overview

Many of the same technologies underly adaptive autonomous robots, scientific knowledge discovery, adaptive game playing and discovery from databases. This course will focus on these key underlying technologies, particularly supervised learning. The course will cover support vector machines, decision tree learners, neural network learning and Bayesian classifiers, among others. It also will address reinforcement learning and learning from relational data, including statistical relational learning and inductive logic programming. It will cover correct evaluation methodology, including case studies of methodological errors.

Course Outline

Course Requirements

The grading for the course will be be based on:

Homework Policy

The homework assignments (both written and programming) and project are to be done individually. You may communicate with other class members about the problem, but please do not seek or receive help from people not in the class, and please do not share answers or code. Homeworks (both written and programming) are due at the start of class on the assigned due date, and late homeworks will be penalized 10 points (out of 100) for each lecture that passes after the assigned due date. At the start of the course every student will be given 5 "free" days, each of which may be used to offset a 10-point late penalty. Free days are non-transferable, and no credit will be given for unused free days. Nevertheless, please use them sparingly because the late penalty is strictly enforced.


Programming Assignments

  • Programming Assignment 0. Assigned 9/5, Due 9/14.

  • Programming Assignment 1. Assigned 9/17, Due 10/3.

  • Programming Assignment 2. Assigned 10/31, Due 11/12. Download Weka and learn how to run learning algorithms such as SMO, how to run cross-validation within Weka, and how to tune parameters repeatedly on each fold of cross-validation. Download four data sets from the UCI Machine Learning Repository. Run SMO on your HW0 dataset and these four UCI datasets. Run both polynomial and RBF (Gaussian) kernels. Tune all parameters. (With polynomial kernels, the parameters are C and E, with RBF they are C and G.) Report the best kernel and parameter settings for each dataset.

    Project

    Projects should be proposed by October 31 (verbal or email communication is acceptable). Projects must be done individually. The basis for the project grade will be your written report, which must be turned in no later than the last day of final exams. The report should be in the style of a conference paper, providing an introduction/motivation, discussion of related work, a description of your work that is detailed enough that the work could be replicated, and a conclusion. The format of the description of your work will depend on the nature of your project. If it is an implementation, then the description should make clear the algorithm(s) implemented and provide experimental results. If it is an application project, the description should say which system was used, how the data (or any other materials used) were collected, what experimental methodology was employed, and some estimate of the quality of the experimental results (e.g. a 10-fold cross-validation accuracy estimate). If it is a theoretical project, then the project description should consist of detailed definitions, theorems, and proofs.