Fan Ding |
Qing Li |
fding5@wisc.edu |
qing.li@wisc.edu |
In this project, electronic products is chosen as our study domain. We select two Web sources (one from Amazon, the other from BestBuy), crawl to retrieve HTML data,
perform information extraction to convert the HTML data into two relational tables. Next, we use Magellan, a data matching
system develped at Wisconsin, to do the blocking and matching for the two tables.
Data Source: Amazon.com, Bestbuy.com
Crawler: scrapy
Amazon Table CSV File
Amazon Table Related HTMLs
Attributes: id, name, amazon price, original price, features, url
BestBuy Table CSV File
BestBuy Table Related HTMLs
Attributes: id, name, price, description, features, url
Last Updated : December 5th 2015