Publications
Recent publications (with technical reports and slides where available) are listed below. For complete publication lists, see DBLP and Google Scholar. Individual project pages also include related publications.
- Columbo: Expanding Abbreviated Column Names for Tabular Data Using Large Language Models, T. Cai, S. Sheen, A. Doan. EMNLP-25.
- Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity Matching, D. Paulsen, Y. Govind, A. Doan. VLDB-23. [65 citations as of 3/31/2026]
- Toward Data Cleaning with a Target Accuracy: A Case Study for Value Normalization, A. Ardalan, D. Paulsen, A. Saini, W. Cai, A. Doan. IEEE Big Data 2022.
- Deep Learning for Blocking in Entity Matching: A Design Space Exploration, S. Thirumuruganathan, H. Li, N. Tang, M. Ouzzani, Y. Govind, D. Paulsen, G. Fung, A. Doan. VLDB-21. [165 citations as of 3/31/2026]
- Deep Entity Matching with Pre-Trained Language Models, Y. Li, J. Li, Y. Suhara, A. Doan, W. Tan. VLDB-20. [656 citations as of 3/31/2026]
- Magellan: Toward Building Ecosystems of Entity Matching Solutions, A. Doan, P. Konda, P. Suganthan G.C., Y. Govind, D. Paulsen, K. Chandrasekhar, P. Martinkus, M. Christie. Communications of the ACM, 2020.
- Data Curation with Deep Learning, S. Thirumuruganathan, N. Tang, M. Ouzzani, A. Doan. EDBT-20. [102 citations as of 3/31/2026]
- Manually Detecting Errors for Data Cleaning Using Adaptive Crowdsourcing Strategies, H. Zhang, C. Chai, A. Doan, P. Koutris, E. Arcaute. EDBT-2020.
- Entity Matching Meets Data Science: A Progress Report from the Magellan Project, Y. Govind, P. Konda, and others. SIGMOD-19. Industrial paper.
- Executing Entity Matching End to End: A Case Study, P. Konda, S. Seshadri, E. Segarra, B. Hueth, A. Doan. EDBT-19. Industrial paper.
- Smurf: Self-Service String Matching Using Random Forests, P. Suganthan G.C., A. Ardalan, A. Doan, A. Akella. VLDB-18.
- CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching, Y. Govind, E. Paulson, P. Nagarajan, P. Suganthan G.C., A. Doan, Y. Park, G. Fung, D. Conanthan, M. Carter, M. Sun. VLDB-18. Demo paper.
- Toward a System Building Agenda for Data Integration (and Data Science), A. Doan, P. Konda, P. Suganthan G.C., A. Ardalan, J. Ballard, S. Das, Y. Govind, H. Li, P. Martinkus, S. Mudgal, E. Paulson, H. Zhang. IEEE Data Engineering Bulletin, Special Issue on Large-Scale Data Integration, 2018. Invited paper.
- BigGorilla: An Open-Source Ecosystem for Data Preparation and Integration, C. Chen, B. Golshan, A. Halevy, W. Tan, A. Doan. IEEE Data Engineering Bulletin, Special Issue on Large-Scale Data Integration, 2018. Invited paper. [75 citations as of 3/31/2026]
- Deep Learning for Entity Matching: A Design Space Exploration, S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra. SIGMOD-18. Extended version. [887 citations as of 3/31/2026]
- MatchCatcher: A Debugger for Blocking in Entity Matching, H. Li, P. Konda, P. Suganthan G.C., A. Doan, B. Snyder, Y. Park, G. Krishnan, R. Deep, V. Raghavendra. EDBT-18. Extended version, slides.
- Human-in-the-Loop Data Analysis: A Personal Perspective, A. Doan. HILDA Workshop @ SIGMOD-18.
- Magellan: Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan G.C., P. Martinkus, A. Doan, A. Ardalan, J. Ballard, Y. Govind, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. SIGMOD Record, 2018.
- Magellan: Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan G.C., A. Doan, A. Ardalan, J. Ballard, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. VLDB-16. Extended version, slides. [492 citations as of 3/31/2026]
- Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks, P. Konda, S. Das, P. Suganthan G.C., A. Doan, A. Ardalan, J. Ballard, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. VLDB-16, demo paper. Jupyter notebook & datasets for demo.
- CloudMatcher: A Cloud/Crowd Service for Entity Matching, Y. Govind, E. Paulson, M. Ashok, P. Suganthan G.C., A. Hitawala, A. Doan, Y. Park, P. Peissig, E. LaRose, J. Badger. BIGDAS Workshop @ KDD-17. Slides.
- Human-in-the-Loop Challenges for Entity Matching: A Midterm Report, A. Doan, A. Ardalan, J. Ballard, S. Das, Y. Govind, P. Konda, H. Li, S. Mudgal, E. Paulson, P. Suganthan G.C., H. Zhang. HILDA Workshop @ SIGMOD-17.
- Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services, S. Das, P. Suganthan G.C., A. Doan, J. Naughton, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra, Y. Park. SIGMOD-17. Extended version, slides. [139 citations as of 3/31/2026]
- Towards Interactive Debugging of Rule-Based Entity Matching, F. Panahi, W. Wu, A. Doan, J. Naughton. EDBT-17.
- The Beckman Report on Database Research, with many authors. Communications of the ACM, 2016. Extended version. [257 citations as of 3/31/2026]
- Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing, C. Sun, N. Rampalli, F. Yang, A. Doan. VLDB-14, industrial paper. Slides. [148 citations as of 3/31/2026]
- Corleone: Hands-off Crowdsourcing for Entity Matching, C. Gokhale, S. Das, A. Doan, J. Naughton, N. Rampalli, J. Shavlik, J. Zhu. SIGMOD-14. Slides, extended report. [345 citations as of 3/31/2026]