Vilas Distinguished Achievement Professor
Room 7586, Morgridge Hall
1205 University Ave, Madison WI 53706
News
- Jan 2026: Collaborated with my student Dev Ahluwalia to launch MadMatcher, an entity matching startup.
- 2025: Started SmartCat, a project to build AI-driven data catalog management systems.
- 2019–2023: Co-founded GreenBay to commercialize Magellan. Spent 3 years as VP of Technology at Informatica (which acquired GreenBay).
- 2020: Co-chaired SIGMOD-2020.
- 2017–2019: Contributed to efforts to grow the CS department, build a new CS building, and establish the College of Computing and AI.
- 2016: Started Magellan, a project to build end-to-end entity matching systems.
Research
My goal is to make messy data usable at scale. I work on data exploration, cleaning, matching, and integration—foundational steps in data science pipelines and in preparing high-quality data for AI models.
- We build end-to-end systems, develop production-grade software, and work closely with real users to solve practical problems.
- Our work draws on databases, AI/ML, and large-scale data processing, and often incorporates ideas from crowdsourcing and user interaction.
- We aim to translate research into widely used tools and real-world impact.
Current Projects
- SmartCat (2025 – present): Builds AI-driven data catalog management systems.
- Magellan (2016 – present): Builds end-to-end entity matching systems.
- Cymphony (2019 – present): Builds a crowdsourcing platform for data science.
Past Projects: Schema/Ontology Matching (2000–2009), Crowdsourcing (2002–2015), Knowledge Graphs (2004–2015).
Selected Publications (DBLP Google Scholar)
- Columbo: Expanding Abbreviated Column Names for Tabular Data Using Large Language Models, T. Cai, S. Sheen, A. Doan. EMNLP-25.
- Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity Matching, D. Paulsen, Y. Govind, A. Doan. VLDB-23.
- Magellan: Toward Building Ecosystems of Entity Matching Solutions, A. Doan, P. Konda, P. Suganthan G. C., Y. Govind, D. Paulsen, K. Chandrasekhar, P. Martinkus, M. Christie. Communications of the ACM, 2020.
- Deep Entity Matching with Pre-Trained Language Models, Y. Li, J. Li, Y. Suhara, A. Doan, W. Tan. VLDB-21.
- Entity Matching Meets Data Science: A Progress Report from the Magellan Project, Y. Govind, P. Konda, and others. SIGMOD-19. Industrial paper.
- CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching, Y. Govind, E. Paulson, P. Nagarajan, P. Suganthan G.C., A. Doan, Y. Park, G. Fung, D. Conathan, M. Carter, M. Sun. VLDB-18. Demo paper.
- Deep Learning for Entity Matching: A Design Space Exploration, S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra. SIGMOD-18. Extended version.
- Magellan: Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan G.C., A. Doan, A. Ardalan, J. Ballard, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. VLDB-16.
See more publications.
Selected Awards and Honors
- Gurindar S. Sohi Professorship, 2020
- Vilas Distinguished Achievement Professorship, 2018
- Alfred Sloan Research Fellowship, 2007
- NSF CAREER Award, 2004
- ACM Doctoral Dissertation Award, 2003
Teaching
I usually teach CS 774 (Data Exploration, Cleaning, and Integration for Data Science) and CS 564 (Database Management Systems).
Service
Selected service for the data management community:
- Member, SIGMOD Advisory Board
- Co-chair, SIGMOD-2020
- Co-chair, Beckman meeting, 2013
- Administrator, DBWorld mailing list, 2006–2020
- Co-author, data integration textbook, 2012
Selected service for UW-Madison:
- Founded two professional CS programs (2011–2014), which now generate millions in annual revenue.
- Created a slide deck making the case for a College of Computing (2017), contributing to subsequent legislative efforts.
- Served on the UW Task Force on Computing and drafted the initial version of its report (2018), later cited in UW's announcement of the College of Computing and AI.
These efforts were part of broader initiatives I was closely involved in that grew the CS department (from 32 to 50+ faculty), built a new CS building, created the School of Computer, Data & Information Sciences, and established the College of Computing and AI.