CS 760 - Machine Learning
Homework 3
Assigned: March 24, 2009
Due: 4pm April 10, 2009
100 points
Perceptron Learning
Create a neural-network system that can learn from data formatted in
the same manner as used in HWs 1 and 2. For simplicity, you should
only create perceptrons - that is, neural networks with no hidden units
(see Section 4.4 of Mitchell, especially Table 4.1, but use
the sgn function to discretize outputs before comparing them
to the teacher's outputs).
Discuss how you will represent the data types in CS 760 "*.names" files
for use by a numeric optimization method such as a perceptron.
Your code should automatically adjust the learning rate (eta) using
the method discussed in Lecture 17. You should also update
the weights after each training example (i.e., perform
stochastic gradient descent - see Equation 4.10 and the footnote
to Table 4.1), and don't forget to adjust the
bias (i.e., threshold) by treating it as another weight.
Have your code report the current value of eta every 10 epochs.
Only adjust eta when the perceptron's prediction is wrong.
Your code should also use 20% of the training set as a tuning set,
for use in "early stopping" (see Lecture 18) to prevent overfitting.
Limit your runs to 1000 epochs (feel free to use a lower number if
runtime is an issue; in that case report eta more frequently).
Your code should also perform "weight decay" (Lecture 18).
We really should tune the parameter lambda, but for simplicity
set lambda to be 0.01.
Have your code report the epoch chosen by early stopping
and report for the chosen perceptron state (i.e., the weight and bias values)
all those weights whose magnitude is larger then 0.1.
Be sure to also report the feature associated with the weights printed out.
10-fold Cross Validation
Run your perceptron code on your personal dataset using the same ten folds
you used in previously homeworks. Do a t-test comparison to the best method
from your previous homeworks. Report and discuss the results.
Using a Gaussian Kernel
In this part of the homework, you will use a Gaussian kernel to get a non-linear separating
surface. Instead of using linear or quadratic programming to solve the resulting optimization
task, you will simply use the gradient-descent method you developed for the first part of this
homework, that is a perceptron with weight decay.
To accomplish this, you will use a Gaussian kernel to create the "features"
for a perceptron (see lectures 20-21). For each fold in cross validation,
do this as follows (this is only one of many valid experimental designs):
- Normalize all the features of your examples to be in [0,1].
- Randomly choose 10% of the training examples. Call these "exemplars."
- Randomly select another (disjoint set of) 10% of the training examples for a tuning set.
Call this set kernelTune.
- The similarity to each exemplar will be the features given to the perceptron.
(Often in kernel-based approaches, all training examples
are used as these "exemplars," but to reduce runtime we will be using
the "reduced SVM" idea of Lee and Mangasarian.)
- We will use the Gaussian kernel as the similarity between two examples, A and B, where Ai and Bi are the ith feature of the examples:
kernel(A, B) = exp {- [ SUM (Ai - Bi)^2] / sigma^2}
- We will need to tune the value for sigma. We will simply
try {0.03, 0.1, 0.3, 1, 3, 10}.
For each candidate value of sigma, (1) create a dataset using the resulting kernel
and all examples except the tuning examples in kernelTune,
(2) have your perceptron code learn on it, and (3) evaluate the perceptron
state chosen by "early stopping" on the set kernelTune.
(Note that we are using two tuning sets in this design, one for choosing sigma
and one for deciding when to stop training.)
Give your "features" meaningful names like "similarityToPosExample5."
Report the tune-set accuracies for each of the possibles values for sigma and
return the perceptron state that has the highest accuracy on kernelTune.
10-fold Cross Validation
Run your "kernel" code on your personal dataset using the same ten folds
you used in previously homeworks. Do a t-test comparison to the best method
from your previous homeworks and also do a t-test comparison
to your non-kernel perceptron. Report and discuss the results.
An Additional Experiment of Your Own Choice
Choose one of the following extensions, implement it, and report on it,
including a t-test comparison to the best algorithm on your personal concept
(the two approaches above as well as previous homeworks).
Be sure to briefly say why you choose the experiment you did.
There is no guarantee that all of these options are equally hard, but
you can get full credit regardless of which you choose, so select the one that is most interesting to you.
We will not be auto-grading this portion of the HW.
- Boosted perceptons (simply multiply the gradient by the example's weight
or choose training examples for stochastic gradient descent proportional to their weight)
- Voted perceptrons (see Lecture 17)
- Neural networks with hidden units trained with backpropagation
- Using ID3's information gain to scale the distances in the Gaussian kernel (such scaling
needs to be done separately on each fold of cross validation)
- Using linear (or quadratic) programming to implement support vector machines
(if you choose this option, it is fine to use matlab, but you need to write your own code)
- Using kernel-based features with all of your previous homeworks (should be no need to
rewrite any old algorithms, which is why all is requested here)
Autograding
We will test your code on some datasets of our own.
The API for your code should be:
HW3 task.names train_examples.data test_examples.data useGaussianKernel
The last argument is Boolean-valued. If it is "true," use the
kernel-based approach developed for the second part of this
homework, otherwise use the perceptron, with weight decay,
using the task's features as the input units.
Your code should print out the test-set accuracy, as well as
the "names" of the miscategorized test-set examples.
Requirements
Turn in a report of your experiments and a commented copy of the code
you wrote. Also turn in sample output from your three runs on your trainset #1.