Overview
This project studies semantic integration problems such as schema
matching and reasonsing with matches. These problems are fundamental
to a broad variety of data management applications, including data
integration, warehousing, mining, e-commerce, bio-informatics, and
information processing on the World-Wide Web. We currently focus on:
- Schema matching: We develop the iMAP
approach to discover complex matches such as
``our-price = price * (1 +
tax-rate)''. We also demonstrate how schema matching can benefit
significantly from domain knowledge gleaned from external data and
schemas, and study privacy-preserving schema matching.
- Tuning matching software: Most recent
schema matching systems assemble multiple components, each
employing a particular matching technique. A next crucial problem is
tuning such a matching system: how to select the right
component to be executed and correctly adjust their numerous ``knobs''
(e.g., thresholds, formula coefficients). Tuning is skill- and
time-intensive, but without it the matching accuracy is significantly
inferior. We have developed eTuner, an approach to
automatically tune schema matching systems, at virtually no
cost to the user.
- Designing schemas for interoperability:
If a schema creator knows that the schema will often be matched
against in the future, can he or she design or enhance the
schema in such a way, as to significantly improve the accuracy of
subsequently matching it, or to make it much easier to maintain over
time the semantic mappings (that involve the schema)?
- Organizational and survey efforts: These include
the Semantic Integration Worshop at the Second Semantic Web Conference,
two special issues on semantic integration in SIGMOD Record Dec 04
and AI Magazine Mar 05, and a short survey paper.