CS784 Project: DATA RETRIEVAL, EXTRACTION, AND MATCHING
TEAM:Shruthi Venkatesan & Mahnaz Akbari
Description
We crawled books from Amazon and BarnesandNoble under the following categories:
1) Biography
2) Romance
3) Travel
Amazon
From Amazon, we extracted the following attributes for 3500 books:
ISBN-10
ISBN-13
Title
Author
Publisher
Language
Product Dimensions
Paperback Price
Hardcover Price
Date
Edition
Pages
tableA_HTML
tableA_JSON
tableA_CSV
tableA_Script
BarnesandNoble
From BarnesandNoble, we extracted the following attributes for 3600 books:
ISBN-13
Title
Author
Publisher
Publication Date
Edition
Pages
Audiobook price
Paperback price
Hardcover Price
Related Categories
Series
Product Dimensions
tableB_HTML
tableB_JSON
tableB_CSV
tableB_Script
Blocking
Blocking Description
Blocking Script
Blocking Output
Evaluation using golden data
Golden data
Matching Rules
Action Log
Matching experience (using EMS)