University of Wisconsin Computer Sciences Header Map (repeated with
textual links if page includes departmental footer)

Handling Continuous Features in HW 2

Here's an inefficient but easy way to handle continuous features in ID3:

Before starting the ID3 calc's for the root node, SORT each continuous feature in the current training set and make a list of the BOUNDARY VALUES (ie, those values halfway between two adjacent examples of different classes).
In each cycle of the decision tree, score each of these boundary values (values used higher up in the path to the current node don't need to be rechecked nor do those between examples no longer in the current set of training examples, but this inefficiency is ok).

You really should do the above on each recursive call of ID3, but if that is too complicated doing the above "brute force" method is ok (I think both methods will pick the same thresholds, possibly modulo tie-breaking).

For the "random" splitting function, treat a continuous feature as a single feature in terms of its chances of being picked. If picked, be sure to select a threshold for which there are examples on both sides of the threshold (to prevent infinite loops). Give all such thresholds (i.e., those the split the difference between the values for trainset examples, regardless of category of the example since this is supposed to be random) an equal chance of being selected.