AnHai Doan: Schema & Ontology Matching Project

Schema & Ontology Matching

Overview People and Funding Publications Misc

This project studies semantic integration problems such as schema matching and reasonsing with matches. These problems are fundamental to a broad variety of data management applications, including data integration, warehousing, mining, e-commerce, bio-informatics, and information processing on the World-Wide Web. We currently focus on:

Schema matching: We develop the iMAP approach to discover complex matches such as ``our-price = price * (1 + tax-rate)''. We also demonstrate how schema matching can benefit significantly from domain knowledge gleaned from external data and schemas, and study privacy-preserving schema matching.

Tuning matching software: Most recent schema matching systems assemble multiple components, each employing a particular matching technique. A next crucial problem is tuning such a matching system: how to select the right component to be executed and correctly adjust their numerous ``knobs'' (e.g., thresholds, formula coefficients). Tuning is skill- and time-intensive, but without it the matching accuracy is significantly inferior. We have developed eTuner, an approach to automatically tune schema matching systems, at virtually no cost to the user.

Designing schemas for interoperability: If a schema creator knows that the schema will often be matched against in the future, can he or she design or enhance the schema in such a way, as to significantly improve the accuracy of subsequently matching it, or to make it much easier to maintain over time the semantic mappings (that involve the schema)?

Organizational and survey efforts: These include the Semantic Integration Worshop at the Second Semantic Web Conference, two special issues on semantic integration in SIGMOD Record Dec 04 and AI Magazine Mar 05, and a short survey paper.

People and Funding

current: AnHai Doan, Yoonkyong Lee (Illinois), Mayssam Sayyadian (Illinois)
collaborators: Arnon Rosenthal (MITRE), Chris Clifton (Purdue).
alumni: Robin Dhamanka (student at Illinois, now at Microsoft), Jayant Madhavan,
Alon Halevy, and Pedro Domingos (U Washington).

We gratefully acknowledge support from grants CAREER IIS-0347903 and ITR 0428168, and from MITRE.

Publications (2003 - date)

eTuner: Tuning Schema Matching Software Using Synthetic Scenarios, M. Sayyadian, Y. Lee, A. Doan, A. Rosenthal. VLDB-05.
Corpus-based Schema Matching, J. Madhavan, P. Bernstein, A. Doan, A. Halevy. ICDE-05.
Semantic Integration Research in the Database Community: A Brief Survey, A. Doan and A. Halevy. AI Magazine, Special Issue on Semantic Integration, Spring 2005.
Special Issue on Semantic Integration, A. Doan, N. Noy, A. Halevy (editors). AI Magazine, Spring 2005.
Bootstrapping Domain Ontology for Semantic Web Services from Source Web Sites, W. Wu, A. Doan, C. Yu, and W. Meng. In Proc. of the VLDB-05 Workshop on Technologies for E-Services.
iMAP: Discovering Complex Semantic Matches between Database Schemas, R. Dhamanka, Y. Lee, A. Doan, A. Halevy, and P. Domingos. SIGMOD-04.
Special Issue on Semantic Integration, A. Doan, N. Noy, A. Halevy (editors). ACM SIGMOD Record, 33(4), 2004.
Privacy Preserving Data Integration and Sharing, C. Clifton, A. Doan, A. Elmagarmid, M. Kantarcioglu, G. Schadow, D. Suciu, and J. Vaidya. Proc. of the 9th Int. Workshop on Data Mining and Knowledge Discovery (DMKD-04).
Ontology Matching: A Machine Learning Approach, A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Handbook on Ontologies in Information Systems, S. Staab and R. Studer (eds.), Springer-Velag, 2004. Invited paper. Pages 397-416.
The Proceedings of the Semantic Integration Workshop at ISWC-03, edited by A. Doan, A. Halevy, and N. Noy.
Learning to Match Ontologies on the Semantic Web, A. Doan, J. Madhavan, R. Dhamankar, P. Domingos, and A. Halevy. VLDB Journal, Special Issue on the Semantic Web, 2003.

Misc

Bird-Eye View of Our Schema & Ontology Matching Research (2000-2005)

Develop a multi-component matching architecture: WebDB-00, SIGMOD-01
Introduce machine learning techniques to schema matching: SIGMOD-01, MLJ-03
Learn from past user efforts, external data and schemas, other matching activities: SIGMOD-01, ICDE-05a, SIGMOD-04a, SIGMOD-04b
Learn from multitude of users: WebDB-03, ICDE-05b
Find complex matches: SIGMOD-04a
Match ontologies: WWW-02, VLDBJ-03
Ph.D. thesis, 2002. This one received the ACM Doctoral Dissertation Award in 2003.
Privacy-preserving schema matching: DMKD-04
Survey and organizational activities: see listings under publications
Tune schema matching software: VLDB-05a
Recent Talks
We maintain the Illinois Semantic Integration Archive, which stores schemas, ontologies, associated data instances, and manually created mappings among the schemas/ontologies.

Last updated: Aug 2005.