Mark Goadrich (2007).
Learning Ensembles of First-Order Clauses that Optimize Precision-Recall Curves.
PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison.
(Also appears as UW Technical Report CS-TR-07-1609)
Slides (PPT).
Data.
This publication is available in PDF.
The slides for this publication are available in Microsoft PowerPoint.
The data associated with this publication is available online.
Abstract:
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data, such as biomedical information extraction, citation matching, and learning relationships in social networks. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy, and to examine their tradeoffs by plotting a precision-recall curve. The goal of this thesis is to find new approaches within ILP particularly suited for large, highly skewed domains. I propose and investigate Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs thresholding methods to combine sets of selected clauses. I compare Gleaner to ensembles of standard theories learned by Aleph, a standard ILP algorithm, using a number of large relational domains. I find that Gleaner produces comparable testset results in a fraction of the training time and outperforms Aleph ensembles when given the same amount of training time. I explore extensions to Gleaner with respect to searching and combining clauses, namely find- ing ways to fully explore the hypothesis space as well as to make better use of those found clauses. I also use Gleaner to estimate the probability that a query is true, further investigate the properties underlying precision-recall curves, and then conclude with a discussion of future work in this area.
Return to the publications of the Univ. of Wisconsin Machine Learning Research Group.
Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison
INFORMATION
~ PEOPLE
~ GRADS
~ UNDERGRADS
~ RESEARCH
~ RESOURCES
5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison,
WI 53706
cs@cs.wisc.edu ~ voice: 608-262-1204 ~
fax: 608-262-9777