We used Publisher attribute to build an inverted index on tableB.json (from Barnes and Nobles). The reason that we picked this table to produce the index on was that BnN pages have more unified format. Something that we took advantage of was that all the references to the same publisher use exact same string. Still We couldn't use exact matching between the publisher fields from two sources, because Amazon pages call same publisher with different (but similar) strings. As an example, the same publisher named simply "HarperCollins Publishers" among all Barnes and Nobles pages can be called with "Harper Perennial", "Harper One", "Harper Design" etc. in Amazon pages. Again analyzing our data, we realized that the first 5 characters in the name of our publishers can distinguish among different publishers. It can also considers similar strings pointing to the same publisher as the same. So we used the first 5 characters of the publisher fields from two sources to match our tuples. We built an inverted index on tableB of ( publisher = {isbn_1, isbn_2....}) i.e all those books which are from same publisher will be added to a dictionary[publisher]. Finally, we concluded a pair of tuples as match and added it to the Candidate Set if the first 5 characters of publisher name from tableA.tuple matches with the first 5characters of the Publisher name from tableB. This gave us 148239 records. We applied one more rule to reduce the size of the candidate set. We observed that for few tuples, books with the same ISBN have different publication dates (differing in the day or month but not in the year). Hence, we used exact match on publication year as another criteria for two books to be added to the candidate set. However, when one of the publication dates was null or missing, this rule was not applied. After applying this rule, our candidate set size came down to 38968 records. Assumption: For some records in the tableA(from Amazon), the publication date contained only the year field. So, in such cases we assigned a default publication date as 1/1/year i.e month = 1, date = 1. To ensure that we do not miss the exact matching tuples because of this approximation, we have considered only publication year during the blocking stage.