Project Stage 2:
Crawling & Extracting Structured Data From Web Pages

Extraction Entity Topic: Book

Tasks


  1. Extract data from two web data sources by using the rule-based wrapper. These two data sources contain information about a set of overlapping entities, such as books, movies, cars, etc.
  2. Each of the above two sources contain a reasonable amount of data, and the two sources have a reasonable amount of overlapping entities.
  3. Extract data from these two sources to form two tables A and B (one from each source). The two tables should have the same schema, and each tuple in each table must describe a single entity (all of the same type). For example, if the entity type is person, then each tuple describes a person, and a possible table schema can be A(name, city, state, zip, phone) (and the same schema for Table B).

Members


Sean Chung
cchung49@wisc.edu
Shang-Yen Yeh
syeh6@wisc.edu
Junxia Zhu
jzhu334@wisc.edu