This data set contains information describing product inventories of a grocery business.
The data set is used for the purpose of schema matching, especially for the complex schema matching.
Original Owner and Donor
Anhai Doan
Department of Computer Science
University of Illinois, Champaign-Urbana
Date Donated: February 6, 2004
- This data was selected from the sample databases of Microsoft Access 97.
- This data was collected as a designed experiment for the purpose of Schema Matching.
- As of now the publications that have used this data are:
- Learning to Map between Structured Representations of Data, A. Doan. Ph.D. Dissertation, Univ. of Washington-Seattle, 2002.
- iMAP: Discovering Complex Semantic Matches between Database Schemas, A. Doan, Y. Lee, R. Dhamankar, A. Halevy, and P. Domingos.
Proc. of the ACM SIGMOD Conf. on Management of Data. To appear.
Data Format
This data set consists of one original data set and one sample mapping over generated data from the original one.
- Original Data Set
- This data set consists of 4 xml files. Each file contains several tuples. The join paths among those files are following:
- employee_id: employee.xml & orders.xml
- order_id: orders.xml & order-details.xml
- product_id: order-details.xml & products.xml
- For each of them, the user can download a gzipped file.
- Sample Mapping
- We asked a volunteer to examine and create complex query formulas that combine the attributes.
- We created sample target sources by applying the query formulas over a subset of the original one.
- The target source has been divided into two files to represent some join paths.(Here, the join path is "employee-id".)
Data Files
- Data Sources
- Employee Info
- Order Info
- Detailed Order Info
- Product Info
- Mapping
- Source
- Target(join path:"employee_id")
Illini Semantic Integration Archive
Department of Computer Science
University of Illinois, Champaign-Urbana
Urbana, IL 61801
Last modified: February 6, 2004