Magellan About Research Software Data Users Lessons Learned

Research

This page describes the research projects that fall under the Magellan umbrella. If you have worked on one of these projects and your name is not listed, please accept our apologies and let us know.

SparkMatcher (2023 – Present)

This is our latest and most advanced EM platform. It provides blocking and matching tools that scale to hundreds of millions of tuples using Spark and AI.

Papers

Software

SparkMatcher consists of four open-source packages designed to support end-to-end EM workflows at scale.

The following packages support the blocking step:

The following packages support the matching step:

Users

Data

Startup

Team

CloudMatcher (2017–2019)

CloudMatcher is a hands-off, self-service, cloud-based EM platform. Users upload two tables to be matched and label a small number of tuple pairs as match or no-match. The system then automatically performs blocking and matching using the labeled data and outputs the resulting matches. This design enables business users to perform EM with minimal technical expertise. CloudMatcher was acquired by Informatica in 2020.

Papers

Software

Users

Startup

Team

PyMatcher (2015–2025)

PyMatcher is an EM platform built on Python data science libraries (e.g., pandas, sklearn) and designed to run on a single machine. It targets small to medium-sized tables—typically up to a few million tuples per table.

Papers — Overall Vision, Progress, And Demos

Other Papers

Software

Users

Team

Corleone and Falcon (2013–2018)

This project explored EM solutions that leverage crowdsourcing to enable hands-off matching of large tables at scale. The ideas developed here inspired the design of CloudMatcher.

Papers

Team

Deep Learning (2017–2022)

This project explores using deep learning for both the blocking and matching steps of EM. It investigated a broad design space of neural architectures and training strategies, including pre-trained language models.

Papers

Software

String Matching, Schema Matching, Ontology Matching, and Related Problems

Although designed for entity matching, SparkMatcher and related Magellan software can be applied to a wide range of semantic matching tasks, including string matching, schema matching, and ontology matching.

Examples