Active Learning Literature Survey

Burr Settles

Computer Sciences Technical Report 1648
University of Wisconsin-Madison


Call for Papers: Active Learning for NLP
We invite you to submit your work on active learning for natural language processing and other real-world applications to the NAACL-HLT workshop on Active Learning for NLP (ALNLP). The workshop will be held in Los Angeles in June 2010. Deadline: March 1, 2010.

NEW: Active Learning (Machine Learning) Mailing List.


Download the Current Survey
January 26, 2010 (PDF)

This is an online publication that will be updated periodically to reflect new advances in the field of active learning. Please cite the survey in your work as suggested in Section 1. Your feedback is highly welcome to alleviate errors, and to incorporate material that is either new or has been overlooked in the current version. Please send comments to bsettles@cs.cmu.edu.

Abstract

The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain.

This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

Archived Versions: