K. Cherkauer & J. Shavlik (1996).
Growing Simpler Decision Trees to Facilitate Knowledge Discovery. Proceedings, Second International Conference on Knowledge Discovery and Data Mining, pp. 315-318, Portland, OR. AAAI Press.
This publication is available in PDF and available in postscript.
When using machine learning techniques for knowledge discovery, output that is comprehensible to a human is as important as predictive accuracy. We introduce a new algorithm, SET-Gen, that improves the comprehensibility of decision trees grown by standard C4.5 without reducing accuracy. It does this by using genetic search to select the set of input features C4.5 is allowed to use to build its tree. We test SET-Gen on a wide variety of real-world datasets and show that SET-Gen trees are significantly smaller and reference significantly fewer features than trees grown by C4.5 without using SET-Gen. Statistical significance tests show that the accuracies of SET-Gen's trees are either not distinguishable from or are more accurate than those of the original C4.5 trees on all ten datasets tested.
Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison
INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES
5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
firstname.lastname@example.org ~ voice: 608-262-1204 ~ fax: 608-262-9777