SimpleCrossValidator< TClassifier > Class Template Reference

Cross-validator for simple classifiers, integrating tightly with FastExec. More...

Collaboration diagram for SimpleCrossValidator< TClassifier >:
[legend]

List of all members.

Public Types

typedef TClassifier Classifier
 Typedef of internal classifier used.
typedef TClassifier Classifier
 Typedef of internal classifier used.

Public Member Functions

const Matrix & confusion_matrix () const
 Gets the confusion matrix.
const Matrix & confusion_matrix () const
 Gets the confusion matrix.
const Datasetdata () const
 Gets the dataset.
const Datasetdata () const
 Gets the dataset.
void Init (const Dataset *data_with_labels, int n_labels, int default_k, struct datanode *module_root, const char *classifier_fx_name, const char *kfold_fx_name="kfold")
 Uses FastExec to initialize this.
void Init (const Dataset *data_with_labels, int n_labels, int default_k, struct datanode *module_root, const char *classifier_fx_name, const char *kfold_fx_name="kfold")
 Uses FastExec to initialize this.
index_t n_correct ()
 Gets the number correctly classified over all folds.
index_t n_correct ()
 Gets the number correctly classified over all folds.
index_t n_incorrect ()
 Gets the number incorrect over all folds.
index_t n_incorrect ()
 Gets the number incorrect over all folds.
double portion_correct ()
 Gets the portion calculated correct.
double portion_correct ()
 Gets the portion calculated correct.
void Run (bool randomized=false)
 Runs cross-validation.
void Run (bool randomized=false)
 Runs cross-validation.

Detailed Description

template<class TClassifier>
class SimpleCrossValidator< TClassifier >

Cross-validator for simple classifiers, integrating tightly with FastExec.

Cross-validation runs go under path you give it (kfold_fx_name), by default "kfold". Suppose the classifier you are using is "knn", which you specify as classifier_fx_name. KFold has its own k (the number of folds), but KNN has its own idea of k (the number of nearest neighbors). The results would look like the following:

 /kfold/params/k 1                # number of folds
 /kfold/params/dataset foo.csv    # number of folds
 /kfold/params/n_points 15460     # dataset size
 /kfold/params/n_features 5
 /kfold/0/knn/params/k 5 
 /kfold/0/params/fold 0
 /kfold/0/results/n_correct 1234  # number of correct and incorrect per run
 /kfold/0/results/n_incorrect 312
 /kfold/0/results/p_correct .798
 /kfold/1/params/fold 0
 /kfold/1/knn/params/k 5
 /kfold/1/results/n_correct 1324
 /kfold/1/results/n_incorrect 222
 /kfold/1/results/p_correct .856
 ...
 /kfold/results/n_correct 13123   # overall totals
 /kfold/results/n_incorrect 2337
 /kfold/results/p_correct .849

To do a plot of KNN k versus cross validation correctness, you would use the following select strings:

 /kfold/params/dataset      # the name of the dataset
 /kfold/0/knn/params/k      # this ensures you'll get default params
 /kfold

Before the cross-validator runs, it will copy parameters from the module you specify -- if it is module_root, this will just take the original command line parameters that are stored in "/params". In the previous example, the command line parameters from "/params/knn/" and "/params/kfold/" are used. These parameters are specified by the user as "--params/knn/someparameter=3" or "--param/kfold/k=4" to set KNN's "someparameter" to 3, and the cross-validator's number of folds to 4.

To build a classifier suitable for use with SimpleCrossValidator, you must create a class with the following methods:

 class MyClassifier {
   ...
   // Trains on the dataset specified.  n_classes is the number of class
   // labels.  Tweak parameters can be obtained from the "datanode" passed
   // using fx_param_int, fx_param_double, etc, but passing in "module" as
   // the first parameter instead of NULL.
   //
   void InitTrain(const Dataset& dataset, int n_classes, datanode *module);
   // For a test datum, returns the class label 0 <= label < n_classes
   int Classify(const Vector& test_datum);
 };

Definition at line 115 of file crossvalidation.h.


Member Typedef Documentation

template<class TClassifier >
typedef TClassifier SimpleCrossValidator< TClassifier >::Classifier

Typedef of internal classifier used.

Definition at line 120 of file crossvalidation.h.

template<class TClassifier >
typedef TClassifier SimpleCrossValidator< TClassifier >::Classifier

Typedef of internal classifier used.

Definition at line 120 of file crossvalidation.h.


Member Function Documentation

template<class TClassifier >
const Matrix& SimpleCrossValidator< TClassifier >::confusion_matrix (  )  const [inline]

Gets the confusion matrix.

The element at row i column j is the number of training samples where the actual classification is i but the predicted classification is j.

Definition at line 197 of file crossvalidation.h.

template<class TClassifier >
const Matrix& SimpleCrossValidator< TClassifier >::confusion_matrix (  )  const [inline]

Gets the confusion matrix.

The element at row i column j is the number of training samples where the actual classification is i but the predicted classification is j.

Definition at line 197 of file crossvalidation.h.

template<class TClassifier >
const Dataset& SimpleCrossValidator< TClassifier >::data (  )  const [inline]

Gets the dataset.

Definition at line 202 of file crossvalidation.h.

template<class TClassifier >
const Dataset& SimpleCrossValidator< TClassifier >::data (  )  const [inline]

Gets the dataset.

Definition at line 202 of file crossvalidation.h.

template<class TClassifier >
void SimpleCrossValidator< TClassifier >::Init ( const Dataset data_with_labels,
int  n_labels,
int  default_k,
struct datanode module_root,
const char *  classifier_fx_name,
const char *  kfold_fx_name = "kfold" 
)

Uses FastExec to initialize this.

See details about this class for more information.

Parameters:
data_with_labels dataset with labels as the last feature
n_labels the number of labels (setting this to 0 means to automatically determine from the dataset); the labels must be integers from 0 to n_labels - 1
default_k the default number of folds (overridden by command-line parameter kfold/k)
module_root the fastexec module this is under (usually use fx_root)
classifier_fx_name short name to give it under fastexec
kfold_fx_name the fastexec name of the cross-validator
template<class TClassifier >
void SimpleCrossValidator< TClassifier >::Init ( const Dataset data_with_labels,
int  n_labels,
int  default_k,
struct datanode module_root,
const char *  classifier_fx_name,
const char *  kfold_fx_name = "kfold" 
) [inline]

Uses FastExec to initialize this.

See details about this class for more information.

Parameters:
data_with_labels dataset with labels as the last feature
n_labels the number of labels (setting this to 0 means to automatically determine from the dataset); the labels must be integers from 0 to n_labels - 1
default_k the default number of folds (overridden by command-line parameter kfold/k)
module_root the fastexec module this is under (usually use fx_root)
classifier_fx_name short name to give it under fastexec
kfold_fx_name the fastexec name of the cross-validator

Definition at line 227 of file crossvalidation.h.

References DatasetInfo::feature(), fx_param_int(), fx_submodule(), Dataset::info(), Dataset::n_features(), DatasetFeature::n_values(), DatasetFeature::NOMINAL, and DatasetFeature::type().

template<class TClassifier >
index_t SimpleCrossValidator< TClassifier >::n_correct (  )  [inline]

Gets the number correctly classified over all folds.

Definition at line 177 of file crossvalidation.h.

template<class TClassifier >
index_t SimpleCrossValidator< TClassifier >::n_correct (  )  [inline]

Gets the number correctly classified over all folds.

Definition at line 177 of file crossvalidation.h.

Referenced by SimpleCrossValidator< TClassifier >::Run().

template<class TClassifier >
index_t SimpleCrossValidator< TClassifier >::n_incorrect (  )  [inline]

Gets the number incorrect over all folds.

Definition at line 182 of file crossvalidation.h.

References Dataset::n_points().

template<class TClassifier >
index_t SimpleCrossValidator< TClassifier >::n_incorrect (  )  [inline]

Gets the number incorrect over all folds.

Definition at line 182 of file crossvalidation.h.

References Dataset::n_points().

Referenced by SimpleCrossValidator< TClassifier >::Run().

template<class TClassifier >
double SimpleCrossValidator< TClassifier >::portion_correct (  )  [inline]

Gets the portion calculated correct.

Definition at line 187 of file crossvalidation.h.

References Dataset::n_points().

template<class TClassifier >
double SimpleCrossValidator< TClassifier >::portion_correct (  )  [inline]

Gets the portion calculated correct.

Definition at line 187 of file crossvalidation.h.

References Dataset::n_points().

Referenced by SimpleCrossValidator< TClassifier >::Run().

template<class TClassifier >
void SimpleCrossValidator< TClassifier >::Run ( bool  randomized = false  ) 

Runs cross-validation.

Parameters:
randomized whether to use a random permutation of the data, or just to stride it
template<class TClassifier >
void SimpleCrossValidator< TClassifier >::Run ( bool  randomized = false  )  [inline]

The documentation for this class was generated from the following files:
Generated on Mon Jan 24 12:04:40 2011 for FASTlib by  doxygen 1.6.3