Cross-validator for simple classifiers, integrating tightly with FastExec. More...
Public Types | |
typedef TClassifier | Classifier |
Typedef of internal classifier used. | |
typedef TClassifier | Classifier |
Typedef of internal classifier used. | |
Public Member Functions | |
const Matrix & | confusion_matrix () const |
Gets the confusion matrix. | |
const Matrix & | confusion_matrix () const |
Gets the confusion matrix. | |
const Dataset & | data () const |
Gets the dataset. | |
const Dataset & | data () const |
Gets the dataset. | |
void | Init (const Dataset *data_with_labels, int n_labels, int default_k, struct datanode *module_root, const char *classifier_fx_name, const char *kfold_fx_name="kfold") |
Uses FastExec to initialize this. | |
void | Init (const Dataset *data_with_labels, int n_labels, int default_k, struct datanode *module_root, const char *classifier_fx_name, const char *kfold_fx_name="kfold") |
Uses FastExec to initialize this. | |
index_t | n_correct () |
Gets the number correctly classified over all folds. | |
index_t | n_correct () |
Gets the number correctly classified over all folds. | |
index_t | n_incorrect () |
Gets the number incorrect over all folds. | |
index_t | n_incorrect () |
Gets the number incorrect over all folds. | |
double | portion_correct () |
Gets the portion calculated correct. | |
double | portion_correct () |
Gets the portion calculated correct. | |
void | Run (bool randomized=false) |
Runs cross-validation. | |
void | Run (bool randomized=false) |
Runs cross-validation. |
Cross-validator for simple classifiers, integrating tightly with FastExec.
Cross-validation runs go under path you give it (kfold_fx_name), by default "kfold". Suppose the classifier you are using is "knn", which you specify as classifier_fx_name. KFold has its own k (the number of folds), but KNN has its own idea of k (the number of nearest neighbors). The results would look like the following:
/kfold/params/k 1 # number of folds /kfold/params/dataset foo.csv # number of folds /kfold/params/n_points 15460 # dataset size /kfold/params/n_features 5 /kfold/0/knn/params/k 5 /kfold/0/params/fold 0 /kfold/0/results/n_correct 1234 # number of correct and incorrect per run /kfold/0/results/n_incorrect 312 /kfold/0/results/p_correct .798 /kfold/1/params/fold 0 /kfold/1/knn/params/k 5 /kfold/1/results/n_correct 1324 /kfold/1/results/n_incorrect 222 /kfold/1/results/p_correct .856 ... /kfold/results/n_correct 13123 # overall totals /kfold/results/n_incorrect 2337 /kfold/results/p_correct .849
To do a plot of KNN k versus cross validation correctness, you would use the following select strings:
/kfold/params/dataset # the name of the dataset /kfold/0/knn/params/k # this ensures you'll get default params /kfold
Before the cross-validator runs, it will copy parameters from the module you specify -- if it is module_root, this will just take the original command line parameters that are stored in "/params". In the previous example, the command line parameters from "/params/knn/" and "/params/kfold/" are used. These parameters are specified by the user as "--params/knn/someparameter=3" or "--param/kfold/k=4" to set KNN's "someparameter" to 3, and the cross-validator's number of folds to 4.
To build a classifier suitable for use with SimpleCrossValidator, you must create a class with the following methods:
class MyClassifier { ... // Trains on the dataset specified. n_classes is the number of class // labels. Tweak parameters can be obtained from the "datanode" passed // using fx_param_int, fx_param_double, etc, but passing in "module" as // the first parameter instead of NULL. // void InitTrain(const Dataset& dataset, int n_classes, datanode *module); // For a test datum, returns the class label 0 <= label < n_classes int Classify(const Vector& test_datum); };
Definition at line 115 of file crossvalidation.h.
typedef TClassifier SimpleCrossValidator< TClassifier >::Classifier |
Typedef of internal classifier used.
Definition at line 120 of file crossvalidation.h.
typedef TClassifier SimpleCrossValidator< TClassifier >::Classifier |
Typedef of internal classifier used.
Definition at line 120 of file crossvalidation.h.
const Matrix& SimpleCrossValidator< TClassifier >::confusion_matrix | ( | ) | const [inline] |
Gets the confusion matrix.
The element at row i column j is the number of training samples where the actual classification is i but the predicted classification is j.
Definition at line 197 of file crossvalidation.h.
const Matrix& SimpleCrossValidator< TClassifier >::confusion_matrix | ( | ) | const [inline] |
Gets the confusion matrix.
The element at row i column j is the number of training samples where the actual classification is i but the predicted classification is j.
Definition at line 197 of file crossvalidation.h.
const Dataset& SimpleCrossValidator< TClassifier >::data | ( | ) | const [inline] |
Gets the dataset.
Definition at line 202 of file crossvalidation.h.
const Dataset& SimpleCrossValidator< TClassifier >::data | ( | ) | const [inline] |
Gets the dataset.
Definition at line 202 of file crossvalidation.h.
void SimpleCrossValidator< TClassifier >::Init | ( | const Dataset * | data_with_labels, | |
int | n_labels, | |||
int | default_k, | |||
struct datanode * | module_root, | |||
const char * | classifier_fx_name, | |||
const char * | kfold_fx_name = "kfold" | |||
) |
Uses FastExec to initialize this.
See details about this class for more information.
data_with_labels | dataset with labels as the last feature | |
n_labels | the number of labels (setting this to 0 means to automatically determine from the dataset); the labels must be integers from 0 to n_labels - 1 | |
default_k | the default number of folds (overridden by command-line parameter kfold/k) | |
module_root | the fastexec module this is under (usually use fx_root) | |
classifier_fx_name | short name to give it under fastexec | |
kfold_fx_name | the fastexec name of the cross-validator |
void SimpleCrossValidator< TClassifier >::Init | ( | const Dataset * | data_with_labels, | |
int | n_labels, | |||
int | default_k, | |||
struct datanode * | module_root, | |||
const char * | classifier_fx_name, | |||
const char * | kfold_fx_name = "kfold" | |||
) | [inline] |
Uses FastExec to initialize this.
See details about this class for more information.
data_with_labels | dataset with labels as the last feature | |
n_labels | the number of labels (setting this to 0 means to automatically determine from the dataset); the labels must be integers from 0 to n_labels - 1 | |
default_k | the default number of folds (overridden by command-line parameter kfold/k) | |
module_root | the fastexec module this is under (usually use fx_root) | |
classifier_fx_name | short name to give it under fastexec | |
kfold_fx_name | the fastexec name of the cross-validator |
Definition at line 227 of file crossvalidation.h.
References DatasetInfo::feature(), fx_param_int(), fx_submodule(), Dataset::info(), Dataset::n_features(), DatasetFeature::n_values(), DatasetFeature::NOMINAL, and DatasetFeature::type().
index_t SimpleCrossValidator< TClassifier >::n_correct | ( | ) | [inline] |
Gets the number correctly classified over all folds.
Definition at line 177 of file crossvalidation.h.
index_t SimpleCrossValidator< TClassifier >::n_correct | ( | ) | [inline] |
Gets the number correctly classified over all folds.
Definition at line 177 of file crossvalidation.h.
Referenced by SimpleCrossValidator< TClassifier >::Run().
index_t SimpleCrossValidator< TClassifier >::n_incorrect | ( | ) | [inline] |
Gets the number incorrect over all folds.
Definition at line 182 of file crossvalidation.h.
References Dataset::n_points().
index_t SimpleCrossValidator< TClassifier >::n_incorrect | ( | ) | [inline] |
Gets the number incorrect over all folds.
Definition at line 182 of file crossvalidation.h.
References Dataset::n_points().
Referenced by SimpleCrossValidator< TClassifier >::Run().
double SimpleCrossValidator< TClassifier >::portion_correct | ( | ) | [inline] |
Gets the portion calculated correct.
Definition at line 187 of file crossvalidation.h.
References Dataset::n_points().
double SimpleCrossValidator< TClassifier >::portion_correct | ( | ) | [inline] |
Gets the portion calculated correct.
Definition at line 187 of file crossvalidation.h.
References Dataset::n_points().
Referenced by SimpleCrossValidator< TClassifier >::Run().
void SimpleCrossValidator< TClassifier >::Run | ( | bool | randomized = false |
) |
Runs cross-validation.
randomized | whether to use a random permutation of the data, or just to stride it |
void SimpleCrossValidator< TClassifier >::Run | ( | bool | randomized = false |
) | [inline] |
Runs cross-validation.
randomized | whether to use a random permutation of the data, or just to stride it |
Definition at line 259 of file crossvalidation.h.
References fx_copy_module(), fx_format_result(), fx_param_bool(), fx_submodule(), fx_timer_start(), fx_timer_stop(), datanode::key, math::MakeIdentityPermutation(), math::MakeRandomPermutation(), GenVector< T >::MakeSubvector(), Dataset::matrix(), SimpleCrossValidator< TClassifier >::n_correct(), Dataset::n_features(), SimpleCrossValidator< TClassifier >::n_incorrect(), Dataset::n_points(), SimpleCrossValidator< TClassifier >::portion_correct(), and Dataset::SplitTrainTest().