K. Cherkauer & J. Shavlik (1994).
Selecting Salient Features for Machine Learning from Large Candidate Pools through Parallel Decision-Tree Construction. In H. Kitano, editor, Massively Parallel Artificial Intelligence, pp. 102-136. AAAI Press/The MIT Press, Menlo Park, CA.
This publication is available in PDF and available in postscript.
The particular representation used to describe training and testing examples can have profound effects on an inductive algorithm's ability to learn. However, the space of possible representations is virtually infinite, so choosing a good representation is not a simple task. This chapter describes a method whereby the selection of a good input representation for classification tasks is automated. This technique, which we call DT-Select (``Decision Tree feature Selection''), builds decision trees, via a fast parallel implementation of ID3 (Quinlan, 1986), which attempt to correctly classify the training data. The internal nodes of the trees are features drawn from very large pools of complex general-purpose and domain-specific constructed features. Thus, the features included in the trees constitute compact and informative sets which can then be used as input representations for other learning algorithms attacking the same problem. We have implemented DT-Select on a parallel message-passing MIMD architecture, the Thinking Machines CM-5, enabling us to select from pools containing several hundred thousand features in reasonable time. We present here some work using this approach to produce augmentations of artificial neural network input representations for the molecular biology problem of predicting protein secondary structures.
Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison
INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES
5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
email@example.com ~ voice: 608-262-1204 ~ fax: 608-262-9777