Prev:
XML Storage
XML Publishing and Schema-Based XML Storage: Is XML-to-SQL query
translation similar in both domains?
There is a lot of interest in exporting existing relational data
as XML documents. We refer to this as XML Publishing. Comparing this
with the XML Storage scenario, we see that in both cases we have an
(logical) XML view of some data that is (physically) stored in a
relational database. Moreover, in both the scenarios, given an XML query
over the XML view and the mapping between the XML view and the
relational schema, the goal is to obtain an equivalent SQL query. The
question we ask is the following: Is there any difference between the
two domains as far as the query translation problem is concerned? Or are
the solutions to one domain directly applicable to the other? Currently,
in research literature, the focus is on developing query translation
algorithms for the XML publishing domain and the idea seems to be that
the same algorithms are directly applicable for the XML Storage domain
as well. This is possible as we can reduce any instance of the XML
Storage problem to an instance of the XML Publishing problem through
the notion of reconstruction XML views [SSK+01] or default XML views.
Once we do this, the algorithms for the XML publishing domain are
applicable for the XML Storage domain as well. The main question
therefore is whether there is something we can do in the XML Storage
scenario in a much simpler fashion than in the XML Publishing context.
If so, this indicates that we need to look at XML-to-SQL query
translation separately for the two problems.
We show that the Schema-Based XML Storage scenario is different from
the XML Publishing scenario in the following manner. Previously, we
developed mapping-aware translation algorithms for path expression
queries for the Schema-Based XML Storage scenario. We show how
designing equivalent algorithms for the XML Publishing domain is
difficult. In this case, it involves using integrity constraints on the
underlying relational data in a fairly complex fashion. We develop
algorithms for translating path expression queries into SQL in the XML
Publishing scenario over a non-recursive XML schema. Hopefully, this
difference between the two domains, will make it clear that the
XML-to-SQL query translation problem in the XML Storage domain is far
more simpler than the equivalent problem in the XML Publishing domain
and so needs to be investigated separately.
The main difference between the two cases is as follows: In the XML
Storage scenario, the XML-to-Relational mapping completely defines the
contents of the relational database. In other words, the relational
database contains exactly the same data as the input set of XML
documents. On the other hand, in the XML Publishing scenario, it is
possible that only parts of the relational data was exported in the XML
view. Similarly, it is possible that some other parts of the XML data
were exported several times. So, the XML-to-Relational mapping does not
completely describe the underlying relational data. So, performing
mapping-aware query translation means that we need to look at other
sources of information: namely integrity constraints on the underlying
relational data, before we can decide which parts of the SQL query are
implied by the mapping and the constraints. We next revisit the example
we used for motivating mapping-aware translation for the XML Storage
scenario and see what happens in the XML Publishing case.
Suppose we had the following relational schema for the pre-existing
relational data. Suppose that we export this
Book
Author
Section
sectionid
|
bookid |
sectionparentid |
title |
|
|
|
|
relational data as an XML document according to the XML schema shown in
the figure to the left. This relational schema and XML schema pair are
very similar to the corresponding pair in the XML Storage scenario. Let
us revisit the
same query Q1, which retrieves all the section titles.
Q1:
for $title in document(*)//section/title
return
$title
An equivalent SQL query according to the above view definition is
SQ1: with Temp(id,title) as (
select S.sectionid, S.title
from Book B, Section S
where B.bookid = S.bookid
union all
select S.sectionid, S.title
from Temp T, Section S
where T.id = S.sectionparentid
)
select
title
from Temp
Recall that in the XML Storage scenario, we were
able to design a mapping-aware algorithm that simplified this query to
the following SQL query
MASQ1:
select title
from
Section
Can we do the same thing in the XML Publishing domain? If so, what are
the assumptions we are making? In this particular case, it can be shown
that we can simplify the query to MASQ1 if the following conditions hold
- Every section tuple has exactly one of the two fields from bookid
and sectionparentid to be non-null.
- Book.bookid is a key for the Book relation.
- Section.sectionid is a key for the Section relation.
- Section.bookid is a foreign key to the Book relation.
- Section.sectionid is a foreign key to the Section relation.
If these conditions hold on the relational schema, then we can
translate the XML path expression query Q1 into the SQL query MASQ1.
Otherwise, we will have to be satisfied with the SQL query SQ1. Notice
how the fact that every section has exactly one of either a book parent
or a section parent is implicit in the XML Schema in the XML Storage
scenario, while in the XML Publishing scenario it is present in the
constraint information on the underlying relational data. So, in order
to get efficient SQL queries for a given XML query, we need to reason
with the constraints on the underlying relational data. We have
developed a constraint-aware XML-to-SQL translation algorithm for path
expression queries in the XML Publishing scenario, when the XML schema
is non-recursive. Details of this algorithm can be found here.
Extending this to recursive XML schemas is future work.
- Jayavel Shanmugasundaram, Eugene J. Shekita, Jerry Kiernan,
Rajasekar Krishnamurthy, Stratis Viglas, Jeffrey F. Naughton, Igor
Tatarinov: A General Techniques for Querying XML Documents using a
Relational Database System. SIGMOD Record 30(3): 20-26 (2001)
Prev:
XML Storage