K. Cherkauer & J. Shavlik (1993).
Protein Structure Prediction: Selecting Salient Features From Large Candidate Pools. Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, pp. 74-82, Bethesda, MD. AAAI Press.
This publication is available in PDF and available in postscript.
We introduce a parallel approach, ``DT-Select, '' for selecting features used by inductive learning algorithms to predict protein secondary structure. DT-Select is able to rapidly choose small, nonredundant feature sets from pools containing hundreds of thousands of potentially useful features. It does this by building a decision tree, using features from the pool, that classifies a set of training examples. The features included in the tree provide a compact description of the training data and are thus suitable for use as inputs to other inductive learning algorithms. Empirical experiments in the protein secondary-structure task, in which sets of complex features chosen by DT-Select are used to augment a standard artificial neural network representation, yield surprisingly little performance gain, even though features are selected from very large feature pools. We discuss some possible reasons for this result.
Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison
INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES
5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
firstname.lastname@example.org ~ voice: 608-262-1204 ~ fax: 608-262-9777