CS784 Project: DATA RETRIEVAL, EXTRACTION, AND MATCHING

TEAM:Shruthi Venkatesan & Mahnaz Akbari

Description

We crawled books from Amazon and BarnesandNoble under the following categories: 1) Biography 2) Romance 3) Travel

Amazon

From Amazon, we extracted the following attributes for 3500 books:

  • ISBN-10
  • ISBN-13
  • Title
  • Author
  • Publisher
  • Language
  • Product Dimensions
  • Paperback Price
  • Hardcover Price
  • Date
  • Edition
  • Pages

  • tableA_HTML
    tableA_JSON
    tableA_CSV
    tableA_Script

    BarnesandNoble

    From BarnesandNoble, we extracted the following attributes for 3600 books:

  • ISBN-13
  • Title
  • Author
  • Publisher
  • Publication Date
  • Edition
  • Pages
  • Audiobook price
  • Paperback price
  • Hardcover Price
  • Related Categories
  • Series
  • Product Dimensions

  • tableB_HTML
    tableB_JSON
    tableB_CSV
    tableB_Script

    Blocking

    Blocking Description
    Blocking Script
    Blocking Output

    Evaluation using golden data

    Golden data
    Matching Rules
    Action Log
    Matching experience (using EMS)