Topic: Restaurants
Team: Clarence Cheung, Jin Ruan
In this project, we are trying to match restaurants from two different web sources.
Thousands of restaurant data of ten most populated US cities is crawled from
Yelp ( ~ 3000 pages)
and
Yellow Pages ( ~ 9000 pages).
Two csv files (tables) are then generated for further processing in further stages.
Using the csv files, a candidate set is obtained by blocking.
Original | Revised |
---|---|
Yelp Table | Revised Yelp Table |
YP Table | Revised YP Table |