Building Intelligent Agents that Learn to Retrieve and Extract Information

University of Wisconsin Computer Sciences Header Map (repeated with
textual links if page includes departmental footer)

Tina Eliassi-Rad (2001).
Building Intelligent Agents that Learn to Retrieve and Extract Information. PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison.
(Also appears as UW Technical Report CS-TR-01-1431)

This publication is available in PDF and available in postscript.

Abstract:

The rapid growth of on-line information has created a surge of interest in tools that are able to retrieve and extract information from on-line documents. In this thesis, I present and evaluate a computer system that rapidly and easily builds instructable and self-adaptive software agents for both the information retrieval (IR) and the information extraction (IE) tasks. My system is called WAWA (short for Wisconsin Adaptive Web Assistant). WAWA interacts with the user and an on-line (textual) environment (e.g., the Web) to build an intelligent agent for retrieving and extracting information. WAWA has two sub-systems: (i) an information retrieval (IR) sub-system, called WAWA-IR; and, (ii) an information extraction (IE) sub-system, called WAWA-IE. WAWA-IR is a general search-engine agent, which can be trained to produce specialized and personalized IR agents. WAWA-IE is a general extractor system, which creates specialized agents that accurately extract pieces of information from documents in the domain of interest. WAWA utilizes a theory-refinement approach to build its intelligent agents. There are four four primary advantages of using such an approach. First, WAWA's agents are able to perform reasonably well initially because they are able to utilize users' prior knowledge. Second, users' prior knowledge does not have to be correct since it is refined through learning. Third, the use of prior knowledge, plus the continual dialog between the user and an agent, decreases the need for a large number of training examples because training is not limited to a binary representation of positive and negative examples. Finally, WAWA provides an appealing middle ground between non-adaptive agent programming languages and systems that solely learn user preferences from training examples. WAWA's agents have performed quite well in empirical studies. WAWA-IR experiments demonstrate the efficacy of incorporating the feedback provided by the Web into the agent's neural networks to improve the evaluation of potential hyperlinks to traverse. WAWA-IE experiments produce results that are competitive with other state-of-art systems. Moreover, they demonstrate that WAWA-IE agents are able to intelligently and efficiently select from the space of possible extractions and solve multi-slot extraction problems.

Return to the publications of the Univ. of Wisconsin Machine Learning Research Group.

Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison

INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES

5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
cs@cs.wisc.edu ~ voice: 608-262-1204 ~ fax: 608-262-9777