Schema/Ontology Matching (2000–2009)
This project focused on schema and ontology matching, a foundational problem in data management with applications in data integration, warehousing, mining, e-commerce, e-science, and Web data processing.
The work was timely. Soon after the project began around 2000, schema/ontology matching emerged as a major research direction and has remained active ever since.
Contributions and Lessons Learned
Our main contributions include:
- Demonstrating how machine learning can be applied effectively to schema matching
- Showing that achieving high accuracy requires leveraging multiple types of domain knowledge
- Introducing a highly modular, extensible system architecture—an approach that has since become standard
- Developing methods to exploit external domain knowledge (e.g., other schemas) during matching
- Providing some of the earliest clean solutions to challenging problems such as complex schema matches and ontology matching
A key lesson from this project was that crowdsourcing can be highly effective for matching tasks—an insight that motivated my subsequent work in that area.
Another lesson is that schema matching, ontology matching, and entity matching share a common core and can benefit from a unified solution architecture. This insight led to my subsequent work on entity matching in the (ongoing) Magellan project. We are now working to extend the solutions developed in Magellan to other semantic matching tasks, including schema and ontology matching.
Publications
Some files below are in older formats (such as .ps or .ppt) that modern browsers may decline to open. If clicking a link does not work, try copying the URL and pasting it directly into your browser's address bar, then choose to download the file.
PhD Dissertation
- Learning to Map between Structured Representations of Data, A. Doan. Ph.D. Dissertation, Univ. of Washington-Seattle, 2002. Received the ACM Doctoral Dissertation Award in 2003. [118 citations as of 3/31/2026]
Basic Matching Techniques
- Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, A. Doan, P. Domingos, and A. Halevy. SIGMOD-2001. [ppt slides]. [1184 citations as of 3/31/2026] Other versions:
- Learning Source Descriptions for Data Integration, A. Doan, P. Domingos, and A. Levy. WebDB-2000. (a preliminary version of the above paper, [ppt slides])
- Learning Mappings between Data Schemas, A. Doan, P. Domingos, and A. Levy. Proc. of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000. (preliminary version)
- Data Integration: A "Killer App" for Multi-Strategy Learning, A. Doan, P. Domingos, and A. Levy. Proc. of the Workshop on Multi-Strategy Learning (MSL-00), 2000. (preliminary version)
- Learning to Match the Schemas of Databases: A Multistrategy Approach, A. Doan, P. Domingos, and A. Halevy. Machine Learning Journal, 50, Pages 279-301, 2003. (invited journal version) [366 citations as of 3/31/2026]
- Learning to Map between Ontologies on the Semantic Web, A. Doan, J. Madhavan, P. Domingos, and A. Halevy. WWW-2002. [ppt slides]. [1472 citations as of 3/31/2026] Other versions:
- Learning to Match Ontologies on the Semantic Web, A. Doan, J. Madhavan, R. Dhamankar, P. Domingos, and A. Halevy. VLDB Journal, Special Issue on the Semantic Web, 2003. (expanded version) [699 citations as of 3/31/2026]
- Ontology Matching: A Machine Learning Approach, A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Handbook on Ontologies in Information Systems, S. Staab and R. Studer (eds.), Springer-Velag, 2004. Invited paper. Pages 397-416. [764 citations as of 3/31/2026]
- iMAP: Discovering Complex Semantic Matches between Database Schemas, R. Dhamanka, Y. Lee, A. Doan, A. Halevy, and P. Domingos. SIGMOD-2004. [615 citations as of 3/31/2026]
Crowdsourced Schema Matching
- Building Data Integration Systems via Mass Collaboration, R. McCann, A. Doan, A. Kramnik, and V. Varadarajan. Proc. of the Int. Workshop on Web and Databases (WebDB-03). [94 citations as of 3/31/2026]
- Building Data Integration Systems: A Mass Collaboration Approach, A. Doan and R. McCann. Proc. of the IJCAI-03 Workshop on Information Integration on the Web.
- Integrating Data from Disparate Sources: A Mass Collaboration Approach, R. McCann, A. Kramnik, W. Shen, V. Varadarajan, O. Sobulo, A. Doan. ICDE-05. Poster.
- Matching Schemas in Online Communities: A Web 2.0 Approach, R. McCann, W. Shen, A. Doan. ICDE-08. [187 citations as of 3/31/2026]
Matching Web Query Interfaces (on the Deep Web)
- An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web, W. Wu, C. Yu, A. Doan, and W. Meng. SIGMOD-04. [381 citations as of 3/31/2026]
- Merging Interface Schemas on the Deep Web via Clustering Aggregation, W. Wu, A. Doan, and C. Yu. IEEE Int. Conf. on Data Mining (ICDM-05).
- Bootstrapping Domain Ontology for Semantic Web Services from Source Web Sites, W. Wu, A. Doan, C. Yu, and W. Meng. In Proc. of the VLDB-05 Workshop on Technologies for E-Services.
- Learning from the Web to Match Deep-Web Query Interfaces, W. Wu, A. Doan, C. Yu. ICDE-06. [PPT slides].
Workshops, Special Issues, Surveys, Textbook Chapters
- The Proceedings of the Semantic Integration Workshop at ISWC-03, edited by A. Doan, A. Halevy, and N. Noy.
- Report on the Semantic Integration Workshop at the 2nd Int. Semantic Web Conf. (ISWC-03), A. Doan, A. Halevy, and N. Noy. SIGMOD Record, 33(1):138-140, 2004. A related version appeared in AI Magazine, Spring 2004.
- Special Issue on Semantic Integration, A. Doan, N. Noy, A. Halevy (editors). ACM SIGMOD Record, 33(4), 2004.
- Special Issue on Semantic Integration, N. Noy, A. Doan, A. Halevy (editors). AI Magazine, Spring 2005.
- Semantic Integration Research in the Database Community: A Brief Survey, A. Doan and A. Halevy. AI Magazine, Special Issue on Semantic Integration, Spring 2005. [752 citations as of 3/31/2026]
- Chapter 5: Schema Matching and Mapping, in Principles of Data Integration, A. Doan, A. Halevy, Z. Ives, Morgan Kaufmann, 2012.
Others
- Proposal to do privacy-preserving schema matching: Privacy Preserving Data Integration and Sharing, C. Clifton, A. Doan, A. Elmagarmid, M. Kantarcioglu, G. Schadow, D. Suciu, and J. Vaidya. Proc. of the 9th Int. Workshop on Data Mining and Knowledge Discovery (DMKD-04). [284 citations as of 3/31/2026]
- How to maintain the discovered semantic mappings over time (also related to the wrapper maintenance problem)? Maveric: Mapping Maintenance for Data Integration Systems, R. McCann, B. AlShelbi, Q. Le, H. Nguyen, L. Vu, A. Doan. VLDB-05. [PPT slides]. [110 citations as of 3/31/2026]
- How to exploit a corpus of schemas to match two schemas: Corpus-based Schema Matching, J. Madhavan, P. Bernstein, A. Doan, A. Halevy. ICDE-05. [588 citations as of 3/31/2026]
- Tuning matching software: how to select the right component to be executed and correctly adjust their numerous "knobs" (e.g., thresholds, formula coefficients): eTuner: Tuning Schema Matching Software Using Synthetic Scenarios, Y. Lee, M. Sayyadian, A. Doan, A. Rosenthal. VLDB Journal Special Issue, Best Papers of VLDB-05. 2006. Invited. [205 citations as of 3/31/2026] (An earlier paper: eTuner: Tuning Schema Matching Software Using Synthetic Scenarios, M. Sayyadian, Y. Lee, A. Doan, A. Rosenthal. VLDB-05. [PPT slides].)
- How to do keyword search across multiple RDBMSs? First we must do schema matching. Efficient Keyword Search across Heterogeneous Relational Databases, M. Sayyadian, H. LeKhac, A. Doan, L. Gravano. ICDE-07. [153 citations as of 3/31/2026]
- Designing schemas for interoperability: If a schema will often be matched against in the future, how can we design it in a way that helps schema matching? Analyzing and Revising Data Integration Schemas to Improve Their Matchability, X. Chai, M. Sayyadian, A. Doan, A. Rosenthal, L. Seligman. VLDB-08.
Selected Talk Slides
- Learning to Map between Structured Representations of Data. @ UIUC, 2002, job talk.
- Schema & Ontology Matching: Current Research Directions. Univ. of Southern California, 2004.