***************************************
* PSVM -
Frecuently Asked Questions *
***************************************
#1
*************************************************************************
>
* What did you use for nu?
It completely
depends on the dataset, can be from 10^6 to 0.01 what I
did was to take a
tuning set apart from the training set until I got the
nu that worked ok
for me. Usually 1 works very well in most of the
cases, so I would start
with 1 and increase it or decrease it in powers
of 10 depending on the
results in the tuning set. But the general rule
is:
Bigger nu
==> better fitting of the training data
#2
*************************************************************************
> * What did you use for your kernel
parameters?
> Specifically,
what form of the Gaussian function,
> with what value of sigma, did you use?
The form of the Gaussian Kernel I used was:
K(i,j)=exp(-s*(norm(A(i,:)-B(:,j)).^2));
With the value of sigma in the range
[0.00001 ,2]
Again, the higher sigma is, the better you are fitting
the data.
Note: in some examples you may want to scale the data
first.
some examples are:
Ionosphere: No linear
transformation (scale) and sigma=0.1
Pima: linear transformation and sigma = 0.5
Liver: No linear transformation and
sigma=0.00001
tic-tac-toe:
No linear transformation ans sigma=0.025
Note: These values
are not necessarily the ones I used in my experiments
but they are very
similar.
#3
****************************************************************************
> * For the ten-fold training, how did you
divide the
> data? Did you just divide the data
(randomly?)
> into ten
subsets, then make each training set
> nine of these sets, and test on the last one, or
> did you do something else?
The
data was divided randomly into ten subsets, then a training
set was
defined as the union of nine of these sets, and a testing set was the
remaining
one. We did this
ten times (one for each testing fold) and the then
reported results are the
average over the ten runs.
#4
*****************************************************************************
> * I cannot find the "Galaxy Dim"
data. Where did
> you find it?
It is in MATLAB
format in:
http://www.cs.wisc.edu/~gfung/data/
and It is
called: dimdata.mat
This file contains:
a) A matrix
A conatining all the data points
b) a vector d of 1's and -1's
representing the corresponding labels.
#5
*****************************************************************************
> * Regarding the Wisconsin breast cancer
data, what
> "60
months" mean? Also, I could not
find a
> set that was 110 x
32. The sets I found were
> around 569 points.
It is
also in http://www.cs.wisc.edu/~gfung/data/ and it is called:
wpbc60data.mat
, same format. The 60 months refers to the cutoff
date which distinguishes
patients who have had recurrence of the
disease before 60 months and
those who have not.
#6
****************************************************************************
> * The scales of the dimensions in some of
the data
> sets vary wildly;
did you scale or re-weight the
>
dimensions at all, or did you just sue the data
> as distributed?
I did scale
some of them,
For each column j I calculated the max (maxA(j)) and the mean (meanA(j))
and then
for all the elements of that column j I applied the linear
transformation: (1/maxA(j))*(A(:,j)-meanA(j))
#7
***************************************************************************
> * How did you encode the "nominal
values" in the
>
"mushroom" and "tic-tac-toe" data?
I got versions from previous Prof.
Mangasarian Students that had all
numerical attributes already. In the
case of "tic-tac-toe" I think the
format represents the
nine squares of the board starting from the
first row. and the 1's and o's
represent the game pieces of each
player.
These datasets
can be found also in:
http://www.cs.wisc.edu/~gfung/data/
#8
***************************************************************************
>
Do I need to use norm(A,2) to calculate
K = K(A,A') in case of
> Gaussian Kernel? Or is there a simpler
implementation?
There is a mex function you can use in MATLAB
in order to calculate the
kernels, is in:
http://www.cs.wisc.edu/~gfung/kernelmex
just
copy kernel.cpp and kernel.m and then compile the .cpp by typing:
mex kernel.cpp
#9
*************************************************************************
>
When I invoke PSVM as
>
> [w, gamma, v] = PSVM( A*A', D,
nu)
>
> Do I get the w for the Linear formulation? (Since A*A'
is the linear kernel)
if you have the Linear PSVM algo:
[w,gamma]=psvm(A,D,nu)
and
you do K=K(A,A'), and then
[u_bar,gamma]=psvm(K,D,nu);
what
you get is U_bar = D*u.
then the clasification surface is
K(x',A')*U_bar-gamma=0 ===>
K(x',A')*D*u-gamma=0
In
the case of K=K(A,A')=AA' (the linear kernel)
everything goes
smooth:
from Before:
K(x',A')*D*u-gamma=0 ==>
x'A'D*u-gamma=0 ==> x'*w-gamma = 0
since A'Du=w;
#10
********************************************************************
>
In equation 22 of your MPSVM paper
>
> E = D diag((d(\lambda,
\gamma))*) D
>
> when D is a diagonal matrix with only +1 and
-1 as entries doesnt E just
> reduce to
>
> E =
diag((d(\lambda, \gamma))*) ??
this is true.
I
hope this helps...
cheers
Glenn
_______________________________________________________________________________
Glenn
M. Fung
gfung@cs.wisc.edu
Web Page: www.cs.wisc.edu/~gfung/