If the titles match exactly, then they are considered as match. rule1 :title_exact == 1.0 If we use trigram on author and jaccard on title with high threshold, irrespective of other fields, they can be considered as a match. rule2 :author_trigram >= 0.7 AND jaccard_title >= 0.75 If we apply trigram on author with high threshold and the number of pages in the book and the date match exactly, match found. This is true as an author publishing two books on the same date with same pages is very unlikely. rule2.3 :author_trigram >= 0.75 AND date == 1.0 AND pages == 1.0 If the date, author and title are compared, we can consider that a match has been found. rule3 :date == 1.0 AND author_trigram >= 0.7 AND trigram_title >= 0.5 Since jaccard measure will judge mismatch for those books with short titles and missing by one word, we use trigram title and date match. rule5 :trigram_title >= 0.8 AND date == 1.0 RESULTS: Precision : 0.96 Recall : 0.93 F-Score : 0.93 Time spent : 13 hours