***************************************

* PSVM -
Frecuently Asked Questions *

***************************************

#1

*************************************************************************

>
* What did you use for nu?

It completely
depends on the dataset, can be from 10^6 to 0.01 what I

did was to take a
tuning set apart from the training set until I got the

nu that worked ok
for me. Usually 1 works very well in most of the

cases, so I would start
with 1 and increase it or decrease it in powers

of 10 depending on the
results in the tuning set. But the general rule

is:

Bigger nu
==> better fitting of the training data

#2

*************************************************************************

> * What did you use for your kernel
parameters?

> Specifically,
what form of the Gaussian function,

> with what value of sigma, did you use?

The form of the Gaussian Kernel I used was:

K(i,j)=exp(-s*(norm(A(i,:)-B(:,j)).^2));

With the value of sigma in the range
[0.00001 ,2]

Again, the higher sigma is, the better you are fitting
the data.

Note: in some examples you may want to scale the data
first.

some examples are:

Ionosphere: No linear
transformation (scale) and sigma=0.1

Pima: linear transformation and sigma = 0.5

Liver: No linear transformation and
sigma=0.00001

tic-tac-toe:
No linear transformation ans sigma=0.025

Note: These values
are not necessarily the ones I used in my experiments

but they are very
similar.

#3

****************************************************************************

> * For the ten-fold training, how did you
divide the

> data? Did you just divide the data
(randomly?)

> into ten
subsets, then make each training set

> nine of these sets, and test on the last one, or

> did you do something else?

The
data was divided randomly into ten subsets, then a training

set was
defined as the union of nine of these sets, and a testing set was the

remaining
one. We did this

ten times (one for each testing fold) and the then
reported results are the

average over the ten runs.

#4

*****************************************************************************

> * I cannot find the "Galaxy Dim"
data. Where did

> you find it?

It is in MATLAB
format in:

http://www.cs.wisc.edu/~gfung/data/

and It is
called: dimdata.mat

This file contains:

a) A matrix
A conatining all the data points

b) a vector d of 1's and -1's
representing the corresponding labels.

#5

*****************************************************************************

> * Regarding the Wisconsin breast cancer
data, what

> "60
months" mean? Also, I could not
find a

> set that was 110 x
32. The sets I found were

> around 569 points.

It is
also in http://www.cs.wisc.edu/~gfung/data/ and it is called:

wpbc60data.mat
, same format. The 60 months refers to the cutoff

date which distinguishes
patients who have had recurrence of the

disease before 60 months and
those who have not.

#6

****************************************************************************

> * The scales of the dimensions in some of
the data

> sets vary wildly;
did you scale or re-weight the

>
dimensions at all, or did you just sue the data

> as distributed?

I did scale
some of them,

For each column j I calculated the max (maxA(j)) and the mean (meanA(j))

and then
for all the elements of that column j I applied the linear

transformation: (1/maxA(j))*(A(:,j)-meanA(j))

#7

***************************************************************************

> * How did you encode the "nominal
values" in the

>
"mushroom" and "tic-tac-toe" data?

I got versions from previous Prof.
Mangasarian Students that had all

numerical attributes already. In the
case of "tic-tac-toe" I think the

format represents the
nine squares of the board starting from the

first row. and the 1's and o's
represent the game pieces of each

player.

These datasets
can be found also in:

http://www.cs.wisc.edu/~gfung/data/

#8

***************************************************************************

>
Do I need to use norm(A,2) to calculate
K = K(A,A') in case of

> Gaussian Kernel? Or is there a simpler
implementation?

There is a mex function you can use in MATLAB
in order to calculate the

kernels, is in:

http://www.cs.wisc.edu/~gfung/kernelmex

just
copy kernel.cpp and kernel.m and then compile the .cpp by typing:

mex kernel.cpp

#9

*************************************************************************

>
When I invoke PSVM as

>

> [w, gamma, v] = PSVM( A*A', D,
nu)

>

> Do I get the w for the Linear formulation? (Since A*A'
is the linear kernel)

if you have the Linear PSVM algo:

[w,gamma]=psvm(A,D,nu)

and
you do K=K(A,A'), and then

[u_bar,gamma]=psvm(K,D,nu);

what
you get is U_bar = D*u.

then the clasification surface is
K(x',A')*U_bar-gamma=0 ===>

K(x',A')*D*u-gamma=0

In
the case of K=K(A,A')=AA' (the linear kernel)

everything goes
smooth:

from Before:

K(x',A')*D*u-gamma=0 ==>
x'A'D*u-gamma=0 ==> x'*w-gamma = 0

since A'Du=w;

#10

********************************************************************

>
In equation 22 of your MPSVM paper

>

> E = D diag((d(\lambda,
\gamma))*) D

>

> when D is a diagonal matrix with only +1 and
-1 as entries doesnt E just

> reduce to

>

> E =
diag((d(\lambda, \gamma))*) ??

this is true.

I
hope this helps...

cheers

Glenn

_______________________________________________________________________________

Glenn
M. Fung

gfung@cs.wisc.edu

Web Page: www.cs.wisc.edu/~gfung/