Fast and Accurate Entity Matching with AI
From research to real-world scale
Started in 2015, Magellan is a major R&D project at UW–Madison focused on entity matching (EM)—a foundational challenge in data science and AI that affects data integration, analytics, and downstream modeling.
Our mission is to advance the science and practice of entity matching by building software, collaborating with real users, transferring technology to industry, and publishing high-impact research.
Over the years, Magellan has produced three major EM platforms and two startups:
- PyMatcher
An on-premise Python platform for entity matching. PyMatcher has been widely used by researchers, domain scientists, and companies, and parts of it have been incorporated into several popular open-source systems. - CloudMatcher
A cloud-based, hands-off entity matching platform. CloudMatcher led to the founding of GreenBay Tech and its acquisition by Informatica in 2020. The technology was incorporated into multiple Informatica products and has served thousands of enterprise customers. - SparkMatcher
A distributed, Spark-based platform for large-scale entity matching. SparkMatcher can efficiently match hundreds of millions of tuples, combining scalable data processing with AI-driven matching techniques.
Publications from the Magellan project have been cited thousands of times and have received Research Highlight Awards from both SIGMOD and ACM.
In 2025, the project inspired a new startup, MadMatcher, founded by Dev Ahluwalia, a CS graduate student at UW–Madison. MadMatcher builds on and extends SparkMatcher with Generative AI–based entity matching capabilities.
Looking for Entity Matching Software?
- For a polished, production-ready solution, check out MadMatcher.
- For cutting-edge research prototypes and alpha-stage software, explore Magellan (this site).
- Or contact us at entitymatchinginfo@gmail.com to discuss your use case.