***************************************
*  PSVM - Frecuently Asked Questions  *
***************************************

#1
*************************************************************************

>   * What did you use for nu?

It completely depends on the dataset, can be from 10^6 to 0.01 what I
did was to take a tuning set apart from the training set until I got the
nu that worked ok for me. Usually 1 works very well in most of the
cases, so I would start with 1 and increase it or decrease it in powers
of 10 depending on the results in the tuning set. But the general rule
is:

Bigger nu ==> better fitting of the training data

#2
*************************************************************************

>   * What did you use for your kernel parameters?
>     Specifically, what form of the Gaussian function,
>     with what value of sigma, did you use?

The form of the Gaussian Kernel I used was:

K(i,j)=exp(-s*(norm(A(i,:)-B(:,j)).^2));

With the value of sigma in the range [0.00001 ,2]

Again, the higher sigma is, the better you are fitting the data.

Note: in some examples you may want to scale the data first.

some examples are:

Ionosphere: No linear transformation (scale) and sigma=0.1

Pima: linear transformation   and sigma = 0.5

Liver: No linear transformation and sigma=0.00001

tic-tac-toe: No linear transformation ans sigma=0.025

Note: These values are not necessarily the ones I used in my experiments
but they are very similar.

#3
****************************************************************************

>   * For the ten-fold training, how did you divide the
>     data?  Did you just divide the data (randomly?)
>     into ten subsets, then make each training set
>     nine of these sets, and test on the last one, or
>     did you do something else?

The data was divided randomly into ten subsets, then a training
set was defined as the union of nine of these sets, and a testing set was the
remaining one. We did this
ten times (one for each testing fold) and the then reported results are the
average over the ten runs.

#4
*****************************************************************************

>   * I cannot find the "Galaxy Dim" data.  Where did
>     you find it?

It is in MATLAB format in:

http://www.cs.wisc.edu/~gfung/data/

and It is called: dimdata.mat

This file contains:

a) A matrix A conatining all the data points

b) a vector d of 1's and -1's representing the corresponding labels.

#5
*****************************************************************************

>   * Regarding the Wisconsin breast cancer data, what
>     "60 months" mean?  Also, I could not find a
>     set that was 110 x 32.  The sets I found were
>     around 569 points.

It is also in http://www.cs.wisc.edu/~gfung/data/ and it is called:

wpbc60data.mat , same format. The 60 months refers to the cutoff
date which distinguishes patients who have had recurrence of the
disease before 60 months and those who have not.

#6
****************************************************************************

>   * The scales of the dimensions in some of the data
>     sets vary wildly; did you scale or re-weight the
>     dimensions at all, or did you just sue the data
>     as distributed?

I did scale some of them,

For each column j I calculated the max (maxA(j))  and the mean (meanA(j))

and then for all the elements of that column j I applied the linear

transformation:  (1/maxA(j))*(A(:,j)-meanA(j))

#7
***************************************************************************

>   * How did you encode the "nominal values" in the
>     "mushroom" and "tic-tac-toe" data?

I got versions from previous Prof. Mangasarian Students that had all
numerical attributes already. In the case of "tic-tac-toe" I think the
format represents the nine squares of the board starting from the
first row. and the 1's and o's represent the game pieces of each
player.

These datasets can be found also in:

http://www.cs.wisc.edu/~gfung/data/

#8
***************************************************************************

> Do I need to use norm(A,2) to calculate  K = K(A,A') in case of
> Gaussian Kernel? Or is there a simpler implementation?

There is a mex function you can use in MATLAB in order to calculate the
kernels, is in:

http://www.cs.wisc.edu/~gfung/kernelmex

just copy kernel.cpp and kernel.m and then compile the .cpp by typing:

mex kernel.cpp

#9
*************************************************************************
> When I invoke PSVM as
>
> [w, gamma, v] = PSVM( A*A', D, nu)
>
> Do I get the w for the Linear formulation? (Since A*A' is the linear kernel)

if you have the Linear PSVM algo:

[w,gamma]=psvm(A,D,nu)

and you do K=K(A,A'), and then

[u_bar,gamma]=psvm(K,D,nu);

what you get is U_bar = D*u.

then the clasification surface is K(x',A')*U_bar-gamma=0 ===>
K(x',A')*D*u-gamma=0

In the case of K=K(A,A')=AA' (the linear kernel)

everything goes smooth:

from Before:

K(x',A')*D*u-gamma=0 ==> x'A'D*u-gamma=0 ==> x'*w-gamma = 0

since A'Du=w;

#10
********************************************************************

> In equation 22 of your MPSVM paper
>
> E = D diag((d(\lambda, \gamma))*) D
>
> when D is a diagonal matrix with only +1 and -1 as entries doesnt E just
> reduce to
>
> E = diag((d(\lambda, \gamma))*) ??

this is true.

I hope this helps...

cheers

Glenn

_______________________________________________________________________________
Glenn M. Fung
gfung@cs.wisc.edu
Web Page: www.cs.wisc.edu/~gfung/