Schema/Ontology Matching (2000-2009)
This project studied schema/ontology matching, which is fundamental to
many data management applications, including data integration,
warehousing, mining, e-commerce, e-science, and Web data
processing.
The project was very timely. Shortly after it started around 2000,
this direction exploded into a major direction in data management, and
has received much attention ever since. The main contributions of this
project:
- We showed how to apply machine learning to this problem.
- We showed that multiple types of domain knowledge must be exploited to maximize matching accuracy.
- We introduced a highly modular extensible system architecture,
which is pretty much the common matching architecture used today.
- We showed how to exploit domain knowledge (e.g., in form of other schemas) in matching.
- We were among the first to develop clean solutions to several difficult problems, such as finding
complex schema matches and matching ontologies.
One of the main lessons I learned from this project is that crowdsourcing could be ideal for such
matching (and this in turn motivated my subsequent work on crowdsourcing).
People and Funding
AnHai Doan, Robert McCann, Robin Dhamanka, Yoonkyong Lee, Mayssam Sayyadian, Wensheng Wu, Xiaoyong Chai.
Collaborators: Alon Halevy, Pedro Domingos, Phil Bernstein, Jayant Madhavan,
Arnon Rosenthal, Len Seligman, Chris Clifton, Luis Gravano, Natasha Noy, Clement Yu.
We gratefully acknowledge support from grants CAREER IIS-0347903 and
ITR 0428168, MITRE, and Google.
Publications
PhD Dissertation
Basic Matching Techniques
- Reconciling Schemas of
Disparate Data Sources: A Machine Learning Approach, A. Doan,
P. Domingos, and A. Halevy. SIGMOD-2001. ppt slides.
Other versions:
- Learning Source Descriptions for
Data Integration, A. Doan, P. Domingos, and
A. Levy. WebDB-2000. (a preliminary version of the above paper, ppt slides)
- Learning Mappings between Data Schemas
, A. Doan, P. Domingos, and
A. Levy. Proc. of the AAAI-2000 Workshop on
Learning Statistical Models from Relational Data, 2000. (preliminary version)
- Data Integration: A "Killer App" for
Multi-Strategy Learning, A. Doan, P. Domingos, and A. Levy. Proc. of
the Workshop on Multi-Strategy Learning (MSL-00), 2000. (preliminary version)
- Learning to Match the Schemas of Databases: A Multistrategy
Approach, A. Doan, P. Domingos, and A. Halevy. Machine Learning Journal,
50, Pages 279-301, 2003. (invited journal version)
- Learning to Map between Ontologies
on the Semantic Web, A. Doan, J. Madhavan, P. Domingos, and
A. Halevy. WWW-2002. ppt slides.
Other versions:
- Learning to Match Ontologies
on the Semantic Web, A. Doan, J. Madhavan, R. Dhamankar,
P. Domingos, and A. Halevy. VLDB Journal, Special Issue on the
Semantic Web, 2003. (expanded version)
- Ontology Matching: A
Machine Learning Approach, A. Doan, J. Madhavan, P. Domingos,
and A. Halevy.
Handbook on Ontologies in Information Systems, S. Staab and R. Studer (eds.), Springer-Velag,
2004. Invited paper. Pages 397-416.
- iMAP: Discovering Complex Semantic
Matches between Database Schemas, R. Dhamanka, Y. Lee,
A. Doan, A. Halevy, and P. Domingos. SIGMOD-2004.
Crowdsourced Schema Matching
- Building Data Integration
Systems via Mass Collaboration, R. McCann, A. Doan,
A. Kramnik, and V. Varadarajan. Proc. of the Int. Workshop on
Web and Databases (WebDB-03).
- Building Data
Integration Systems: A Mass Collaboration Approach, A. Doan
and R. McCann. Proc. of the IJCAI-03 Workshop on Information
Integration on the Web.
- Integrating Data from
Disparate Sources: A Mass Collaboration Approach, R. McCann,
A. Kramnik, W. Shen, V. Varadarajan, O. Sobulo,
A. Doan. ICDE-05. Poster.
- Matching Schemas in Online Communities: A Web 2.0
Approach, R. McCann, W. Shen, A. Doan. ICDE-08.
Matching Web Query Interfaces (on the Deep Web)
-
An Interactive Clustering-based Approach to Integrating Source Query
interfaces on the Deep Web, W. Wu, C. Yu, A. Doan, and
W. Meng. SIGMOD-04.
- Merging Interface Schemas on the
Deep Web via Clustering Aggregation,
W. Wu, A. Doan, and C. Yu. IEEE Int. Conf. on Data Mining (ICDM-05).
- Bootstrapping Domain
Ontology for Semantic Web Services from Source Web Sites, W. Wu,
A. Doan, C. Yu, and W. Meng. In Proc. of the VLDB-05 Workshop on
Technologies for E-Services.
- Learning from the Web to
Match Deep-Web Query Interfaces, W. Wu,
A. Doan, C. Yu. ICDE-06. PPT slides.
Workshops, Special Isses, Surveys, Textbook Chapters
- The Proceedings of the Semantic Integration Workshop at ISWC-03,
edited by A. Doan, A. Halevy, and N. Noy.
- Report on the
Semantic Integration Workshop at the 2nd Int. Semantic Web
Conf. (ISWC-03), A. Doan, A. Halevy, and N. Noy.
SIGMOD Record, 33(1):138-140, 2004.
A related version appeared in AI Magazine, Spring 2004.
- Special Issue on Semantic Integration, A. Doan,
N. Noy, A. Halevy (editors).
ACM SIGMOD Record, 33(4), 2004.
-
Special Issue on Semantic Integration, N. Noy, A. Doan,
A. Halevy (editors).
AI Magazine, Spring 2005.
- Semantic
Integration Research in the Database Community: A Brief
Survey, A. Doan and A. Halevy. AI Magazine, Special Issue
on Semantic Integration, Spring 2005.
- Chapter 5: Schema Matching and Mapping, in Principles
of Data Integration, A. Doan, A. Halevy, Z. Ives, Morgan Kaufmann, 2012.
Others
- Proposal to do privacy-preserving schema matching:
Privacy Preserving Data Integration and
Sharing, C. Clifton, A. Doan, A. Elmagarmid,
M. Kantarcioglu, G. Schadow, D. Suciu, and J. Vaidya.
Proc. of the 9th Int. Workshop on Data Mining and Knowledge Discovery (DMKD-04).
- How to maintain the discovered semantic
mappings over time (also related to the
wrapper maintenance problem)?
Maveric: Mapping Maintenance
for Data Integration Systems, R. McCann, B. AlShelbi, Q. Le,
H. Nguyen, L. Vu, A. Doan. VLDB-05.
PPT slides.
- How to exploit a corpus of schemas to match two schemas:
Corpus-based Schema
Matching, J. Madhavan, P. Bernstein, A. Doan, A. Halevy.
ICDE-05.
- Tuning matching software: how to
select the right component to be executed and correctly adjust
their numerous ``knobs'' (e.g., thresholds, formula
coefficients):
eTuner: Tuning
Schema Matching Software Using Synthetic Scenarios, Y. Lee,
M. Sayyadian, A. Doan, A. Rosenthal. VLDB Journal Special Issue,
Best Papers of VLDB-05. 2006. Invited. (An earlier paper:
eTuner: Tuning Schema Matching
Software Using Synthetic Scenarios, M. Sayyadian, Y. Lee,
A. Doan, A. Rosenthal. VLDB-05.
PPT slides.)
- How to do keyword search across multiple
RDBMSs? First we must do schema matching.
Efficient Keyword Search across
Heterogeneous Relational Databases, M. Sayyadian, H. LeKhac,
A. Doan, L. Gravano. ICDE-07.
- Designing schemas for
interoperability: If a schema will often be matched against in
the future, how can we design it in a way that helps schema
matching?
Analyzing and
Revising Data Integration Schemas to Improve Their Matchability,
X. Chai, M. Sayyadian, A. Doan, A. Rosenthal,
L. Seligman. VLDB-08.
Selected Talk Slides