University of Wisconsin Computer Sciences Header Map (repeated with 
textual links if page includes departmental footer) Useful Resources Research at UW-Madison CS Dept UW-Madison CS Undergraduate Program UW-Madison CS Graduate Program UW-Madison CS People Useful Information Current Seminars in the CS Department Search Our Site UW-Madison CS Computer Systems Laboratory UW-Madison Computer Sciences Department Home Page UW-Madison Home Page

CS 760 - Machine Learning

Homework 3
Assigned: March 15, 2010
Due: 4pm April 7, 2010
125 points

Perceptron Learning

Create a neural-network system that can learn from data formatted in the same manner as used in HWs 1 and 2. For simplicity, you should only create perceptrons - that is, neural networks with no hidden units (see Section 4.4 of Mitchell, especially Table 4.1, but use the sgn function to discretize outputs before comparing them to the teacher's outputs).

Discuss how you will represent the data types in CS 760 "*.names" files for use by a numeric optimization method such as a perceptron.

Your code should automatically adjust the learning rate (eta) using the method discussed in Lecture 19; let k=100 but feel free to only do eta-adjustment once per epoch if doing it every k examples makes your code run too slow (i.e., adjust eta after the first 100 examples per epoch, then hold eta constant for the remainder of the epoch). You should also update the weights after each training example (i.e., perform stochastic gradient descent - see Equation 4.10 and the footnote to Table 4.1), and don't forget to adjust the bias (i.e., threshold) by treating it as another weight. Have your code report the current value of eta every 10 epochs and include a plot of these values in your report.

Your code should also use 20% of the training set as a tuning set, for use in "early stopping" (see Lecture 19) to prevent overfitting. Limit your runs to 1000 epochs (feel free to use a lower number if runtime is an issue; in that case report eta more frequently).

Your code should also perform "weight decay" (Lecture 19). We really should tune the parameter lambda, but for simplicity set lambda to be 0.01.

Have your code report the epoch chosen by early stopping and report for the chosen perceptron state (i.e., the weight and bias values) all those weights whose magnitude is larger then 0.1. Be sure to also report the feature associated with the weights printed out.

10-fold Cross Validation

Run your perceptron code on your personal dataset using the same ten folds you used in previously homeworks. Do a t-test comparison to the best method from your previous homeworks. Report and discuss the results.

Using a Gaussian Kernel

In this part of the homework, you will use a Gaussian kernel to get a non-linear separating surface. Instead of using linear or quadratic programming to solve the resulting optimization task, you will simply use the gradient-descent method you developed for the first part of this homework, that is a perceptron with weight decay.

To accomplish this, you will use a Gaussian kernel to create the "features" for a perceptron (see lectures 21-22). For each fold in cross validation, do this as follows (this is only one of many valid experimental designs):

10-fold Cross Validation

Run your "kernel" code on your personal dataset using the same ten folds you used in previously homeworks. Do a t-test comparison to the best method from your previous homeworks and also do a t-test comparison to your non-kernel perceptron. Report and discuss the results.

An Additional Experiment of Your Own Choice

Choose any two of the following extensions, implement them, and report on them, including t-test comparisons to the best algorithm on your personal concept (the two approaches above as well as previous homeworks). Be sure to briefly say why you choose the experiments you did.

There is no guarantee that all of these options are equally hard, but you can get full credit regardless of which you choose, so select the ones that are most interesting to you.

We will not be auto-grading this portion of the HW.


We will test your code on some datasets of our own. The API for your code should be:
HW3 task.names useGaussianKernel
The last argument is Boolean-valued. If it is "true," use the kernel-based approach developed for the second part of this homework, otherwise use the perceptron, with weight decay, using the task's features as the input units.

Your code should print out the test-set accuracy, as well as the "names" of the miscategorized test-set examples. Recall that we name examples by their type and position in the testset, e.g., posTestEx1.


Turn in a report of your experiments and a commented copy of the code you wrote. Also turn in sample output from all your runs on your trainset/testset #1 (limit the sample output to one page per algorithm by editing the output your code produces).