Computer Science 764 October 23, 1998

Project Abstract

Kevin Beach, Vuk Ercegovac, Michael Henderson, Amy Rea, Suan Yong

XML-QL is a query language for obtaining data from XML documents on the World Wide Web. From a database viewpoint, an XML document serves as a database from which a query will extract results. While the semi-structured nature of XML lends itself to an object data model, the relational data model has been shown to perform well with queries posed over large data sets. Hence, for this project we propose to build a system that executes relational-like queries over XML data sets that have been transformed into relations. Specifically, we will execute XML-QL queries in a system, written in Java, which will dynamically load and transform XML data sets into relations. The queries will be transformed into intermediate execution plans from which an optimizer will produce a less costly plan to access the relations with RDBMS-like operators.

Since we are primarily interested in issues concerning the use of relations to store and query XML data sets, we will not handle issues relating to recovery, concurrency, or the use of secondary, non-volatile storage. This decision is also supported by the expected normal usage of such a system: the intended user is an XML "surfer" who, given a set of XML documents, poses queries in XML-QL via a GUI applet in a browser that can display the results of the query. In essence, the system will serve as an XML document filter that transforms XML data sets into relations to facilitate more efficient processing.

We will initially develop our system to support only a subset of the features provided by XML-QL, and incrementally add support for other features as time permits. Supporting the complete XML-QL specifications is not necessary to achieve the goals of our project, and would be unfeasible given the project time constraints and potentially difficult areas fundamental to the project such as building the query plan, query optimization, and GUI development. With respect to the query language, we will implement the features that demonstrate most completely the querying aspect of the language and not the data manipulation aspect. As such, the optimizer will only be able to take advantage of operators for which language support has been added. Similarly, the GUI will attempt to provide a clean interface for constructing queries and displaying results in a straightforward way, and not deal with displaying XML graphically. In summary, our goals are to build a system with which we can attain some insight into the performance and design considerations that arise when using relations to store and query XML data sets.

Final Report: