Project: Data Matching

Topic: Restaurants

Team: Clarence Cheung, Jin Ruan



Description:

In this project, we are trying to match restaurants from two different web sources.
Thousands of restaurant data of ten most populated US cities is crawled from Yelp ( ~ 3000 pages) and Yellow Pages ( ~ 9000 pages).
Two csv files (tables) are then generated for further processing in further stages. Using the csv files, a candidate set is obtained by blocking.



Blocking:

Blocking explanation can be found here.


Bonus:

Magellan User Report/Survey


Acknowledgement

Special thanks to Prof. AnHai Doan and Pradap Konda