Team: Clarence Cheung, Jin Ruan
In this project, we are trying to match restaurants from two different web sources.
Thousands of restaurant data of ten most populated US cities is crawled from Yelp ( ~ 3000 pages) and Yellow Pages ( ~ 9000 pages).
Two csv files (tables) are then generated for further processing in further stages. Using the csv files, a candidate set is obtained by blocking.