# Brief Overview of the MSM-T Algorithm

Let A and B be finite, disjoint point sets in n-dimensional Euclidean space, represented by the m x n and k x n matrices A and B, respectively. The MSM-T algorithm generates a decision tree representing the planes needed to separate the sets A and B . Each non-leaf node in the tree is plane that further separates A and B . Each leaf node contains points of either A or B exclusively (or to a prescribed tolerance). For a detailed description of the MSM-T algorithm, see:

O. L. Mangasarian.
Mathematical Programming in Neural Networks. ORSA Journal on Computing, Vol. 5, No. 4, Fall 1993, pages 349 - 360.

# Generation of the Decision Tree

The MATLAB representation of the matrices A and B (from now on denoted by A and B), must be placed in the MATLAB environment. This can be done either by actually entering them by hand, or by placing them in an M-file and loading the M-file into the MATLAB environment.

The decision tree is technically represented as a matrix in the MATLAB environment. This matrix representation of the decision tree must be generated. To generate this matrix, call (in the MATLAB environment):

T = msmt_tree(A,B,max_depth,tolerance,certainty_factor,min_points)

In the above expression the various symbols are defined as follows:

• A, B: MATLAB representation of the matrices A and B.
• max_depth: maximum allowable depth of the decision tree (must be greater than or equal to 1). If this argument is not given, then max_depth is set (by default) to some huge positive integer.
• tolerance: percentage of allowable error in a leaf node (must be between 0.0 and 1.0). If this argument is not given, then tolerance is set (by default) to 0.0.
• certainty_factor: used in C4.5 pruning algorithm (see Pruning). If this argument is not given, or certainty_factor = 0.0, the tree is not pruned.
• min_points: used in point pruning algorithm (see Pruning). If this argument is not given or min_points = 0, the tree is not pruned.

See msmt_tree.m file.

# Displaying the Decision Tree

The decision tree generated by the call above can be displayed graphically by calling the following routine (within the MATLAB environment):

disp_tree(T,A,B)

where:

• T: matrix representing the decision tree in the MATLAB environment.
• A: matrix representing the point set A in the MATLAB environment.
• B: matrix representing the point set B in the MATLAB environment.

The following is an example of the graphical representation of the decision tree using Wisconsin Breast Cancer data.

Each node in the tree is numbered. In the MATLAB environment, the following information is provided:

• For each non-leaf node:
• Equation of the plane is given as: wx = theta.
• Number of points of set A at this node.
• Number of points of set B at this node.
• For each leaf node:
• Identification that the node is a leaf node
• Number of points of set A at this node.
• Number of points of set B at this node.

See disp_tree.m file.

# Pruning

Pruning removes potentially unnecessary subtrees from the decision tree. This MATLAB implementation allows for pruning using 2 different algorithms: (1) Error-based pruning from C4.5: Programs for Machine Learning, and (2) Minimum misclassified points algorithm.

### Error-Based Pruning

To prune the given decision tree using the error-based pruning algorithm (outlined in C4.5: Programs for Machine Learning), call (in the MATLAB environment):

T = prune_tree_C45(T,A,B,certainty_factor)

where:

• T: matrix representing the decision tree in the MATLAB environment.
• A, B: MATLAB representation of matrices A and B.
• certainty_factor: real number between (and including) 0.0 and 1.0. Smaller values of certainty_factor will result in more pruning, and vice-versa for larger values. NOTE: Suggested value for certainty_factor is 0.25.

The decision tree may also be pruned by this algorithm when the tree is generated by giving a value for certainty_factor in the call:

T = msmt_tree(A,B,max_depth,tolerance, certainty_factor, min_points)

For a detailed description of the pruning algorithm, see

J. Ross Quinlan.
C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, San Mateo, California.

See prune_tree_C45.m file.

### Minimum Misclassified Points Pruning

The minimum misclassified points algorithm works as follows. An integer number of allowable misclassified points is given. If a plane is generated that splits a node and has less than this number of allowable misclassified points, this decision node is made into a leaf node. If the plane generated splits more than this allowable number of misclassified points, this decision node remains. This pruning algorithm can be called (in the MATLAB environment) by:

T = prune_tree_points(T,A,B,min_points)

where:

• T: matrix representing the decision tree in the MATLAB environment.
• A, B: MATLAB representation of matrices A and B.
• min_points: the minimum allowable number of misclassified points at a decision node.

The decision tree may be pruned using this algorithm when it is generated by giving a value for min_points in the call:

T = msmt_tree(A,B,max_depth,tolerance, certainty_factor, min_points)

See prune_tree_points.m file.

# Cross-Validation

The performance of the MSM-T algorithm on a given data set may be tested by cross-validation. The cross-validation procedure works as follows. The sets A and B are equally divided into a given number of groups (num_groups); for each group, a decision tree is constructed using (num_groups - 1) groups. The tree is then tested using the group set aside. The algorithm returns percent (percentage of correctly classified points by the MSM-T algorithm), confusion_matrix (see cross_val.m file), ave_planes (average number of planes needed in the separation). The cross-validation procedure can be initiated by making the following call (in the MATLAB environment):

[percent,confusion_matrix,ave_planes] = cross_val(A,B,num_groups,max_depth,tolerance,certainty_factor,min_points)

where:

• A, B: MATLAB representation of matrices A and B.
• numgroups: number of groups to use in cross-validation.
• max_depth, tolerance, certainty_factor, min_points: See Generating the Decision Tree

See cross_val.m file.

# Storage of Decision Trees

A decision tree may be written to any specified file using the following call (in the MATLAB environment):

msmt_write_file(T,'filename')

where:

• T: matrix representing the decision tree in the MATLAB environment.
• filename: name of the file to be written (NOTE: this name must be enclosed in single quotes)

Once a tree has been written, it can be retrieved and put back in the MATLAB environment with the following call (in the MATLAB environment):

where:

• filename: name of the file to read (NOTE: this must be enclosed in single quotes)
• T: the decision tree is place in the MATLAB environment as matrix T.

See msmt_write_file.m and msmt_read_file.m files.

Last modified: Thu Jul 6 11:04:38 1995 by Paul Bradley
paulb@cs.wisc.edu