University of Wisconsin Computer Sciences Header Map (repeated with 
textual links if page includes departmental footer) Useful Resources Research at UW-Madison CS Dept UW-Madison CS Undergraduate Program UW-Madison CS Graduate Program UW-Madison CS People Useful Information Current Seminars in the CS Department Search Our Site UW-Madison CS Computer Systems Laboratory UW-Madison Computer Sciences Department Home Page UW-Madison Home Page

K. Cherkauer & J. Shavlik (1994).
Selecting Salient Features for Machine Learning from Large Candidate Pools through Parallel Decision-Tree Construction. In H. Kitano, editor, Massively Parallel Artificial Intelligence, pp. 102-136. AAAI Press/The MIT Press, Menlo Park, CA.



This publication is available in PDF and available in postscript.

Abstract:

The particular representation used to describe training and testing examples can have profound effects on an inductive algorithm's ability to learn. However, the space of possible representations is virtually infinite, so choosing a good representation is not a simple task. This chapter describes a method whereby the selection of a good input representation for classification tasks is automated. This technique, which we call DT-Select (``Decision Tree feature Selection''), builds decision trees, via a fast parallel implementation of ID3 (Quinlan, 1986), which attempt to correctly classify the training data. The internal nodes of the trees are features drawn from very large pools of complex general-purpose and domain-specific constructed features. Thus, the features included in the trees constitute compact and informative sets which can then be used as input representations for other learning algorithms attacking the same problem. We have implemented DT-Select on a parallel message-passing MIMD architecture, the Thinking Machines CM-5, enabling us to select from pools containing several hundred thousand features in reasonable time. We present here some work using this approach to produce augmentations of artificial neural network input representations for the molecular biology problem of predicting protein secondary structures.


return Return to the publications of the Univ. of Wisconsin Machine Learning Research Group.

Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison


INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES

5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
cs@cs.wisc.edu ~ voice: 608-262-1204 ~ fax: 608-262-9777