University of Wisconsin Computer Sciences Header Map (repeated with 
textual links if page includes departmental footer) Useful Resources Research at UW-Madison CS Dept UW-Madison CS Undergraduate Program UW-Madison CS Graduate Program UW-Madison CS People Useful Information Current Seminars in the CS Department Search Our Site UW-Madison CS Computer Systems Laboratory UW-Madison Computer Sciences Department Home Page UW-Madison Home Page

M. Goadrich, L. Oliphant & J. Shavlik (2004).
Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction. Proceedings of the Fourteenth International Conference on Inductive Logic Programming, pp. 98-115, Porto, Portugal.
Slides (PPT). Data.



This publication is available in PDF.

The slides for this publication are available in Microsoft PowerPoint.

The data associated with this publication is available online.

Abstract:

Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an ''at least N of these M clauses'' thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles.


return Return to the publications of the Univ. of Wisconsin Machine Learning Research Group.

Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison


INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES

5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
cs@cs.wisc.edu ~ voice: 608-262-1204 ~ fax: 608-262-9777