Project: Data Matching

Topic: Restaurants

Team: Clarence Cheung, Jin Ruan


In this project, we are trying to match restaurants from two different web sources.
Thousands of restaurant data of ten most populated US cities is crawled from Yelp ( ~ 3000 pages) and Yellow Pages ( ~ 9000 pages).
Two csv files (tables) are then generated for further processing in further stages. Using the csv files, a candidate set is obtained by blocking.


Blocking explanation can be found here.


Magellan User Report/Survey


Special thanks to Prof. AnHai Doan and Pradap Konda