Query Engine Streams
Current Implementation
The current version of Niagara uses streams to link the
operators in a query. The individual tuples on a stream
consist of a number of fields, the context of which is
determined by the query optimizer and operator behavior.
The fields of the tuple contain a XNode, which is essen-
tially a document-id,element-id pair that is used to address
a element of an XML document. Operators use the XNode to
retrieve properties or values of that node. Those values
are then used for query processing.
In a typical query a SEQL query run in the select oper-
ator will generate a set of postings. The select operator
then converts those postings into XNodes. The operator then
outputs a stream of those canidate XNodes.
When the stream arrives at an unnest operator, it will
unnest one of the N columns of each tuple and generate a new
stream with tuples that have N+1 fields. The additional
field will be an XNode which corresponds to whatever sub-
element was being unnested. Unnest may have more output
tuples than input tuples, since its exhaustive search may
locate more than one candidate XNode.
When operators evaluate predicates, the XNodes in the
expression are used to access the underlying document.
These accesses consist of extracting random values or prop-
erties of the node in question such that the predicate can
be evaluated.
Problems
One problem with this strategy is that accesses to a
document are essentially random as query processing runs.
As such, it is difficult to generate good disk access pat-
terns, and I/O goes slowly, which also slows query execu-
tion. In addition, the same property or value may need to
be evaluated many times during execution of a query. Each
time the value is accessed another disk access needs to be
made.
The unnest operator must exhaustively search subtrees
of a candidate XNode to unnest it. This can be an expensive
operation, even if the subtree is shallow, as a directly
contains unnest may encounter a large number of children.
-2-
Inlining Values in Tuples
In the new Niagara, Query Engine Tuples will consist of
both XNodes and of inline values. When a document is exam-
ined by unnest, unnest will be able to project out the vari-
ous properties or other values of the XNode into the tuple.
This is similar to who how stream-based relational systems
evaluate queries, and we will achieve the same kind of bene-
fit as those systems do. This improves disk I/O patterns,
as the same node will be examined only once during a query.
There may be another access during result construction as
well. Note, of course, the optimizer may decide to generate
more accesses depending upon it's statistics for reducing
the workload as much as possible. This is similar to the
earlier trade-off mentioned for result construction.
When an operator evaluates a predicate, it will use the
value that is inlined in the tuple. Of course, the opti-
mizer will adjust it's projection lists for each operator to
get rid of values as they are no longer needed.
Note that an unnest may refer to an entire document
subtree, which can be quite large, instead of a value which
is typically small. Or, for example, the value itself may
be quite large, for example an entire textual document. In
these cases the value is not inlined. Instead a flag or
other indicator will be left that tells the expression eval-
uator that this value is not a value. When an operator
needs to access a value, it will either use the inlined
value, or find that the value is absent. When the value is
absente, the evaluator will need to goto disk to retrieve as
much of that value as is needed to determine the result of
the predicate. If the value is an XML subtree, it will need
to be read and converted to text to be compared against.
However, that conversion need only continue until the predi-
cate can be satisfied. If a value was expected, this means
that only one node will need to be accessed before a mis-
match can be determined. However, if the user issues
queries that will require subtree comparison, query execu-
tion will slow down as that occurs. Such is life.
Unnest
With the redesign of the SEQL portion of the query
engine, the role of unnest in the new system will change
quite a bit. As mentioned above, unnest must currently make
an exhaustive sub-document search to generate candidate
unnested nodes. With the new SEQL engine, the query
returned by SEQL portion of query execution may be exact,
providing all the info needed by further Query Engine pro-
cessing, instead of just a root that needs to be, perhaps
exhaustively, searched. In that case, unnest will only need
to convert the search engine stream information to XNodes,
and also retrieve an inline any values which are needed.
-3-
However, the current unnest functionality will also be
required. Exhaustive searches will still be required for
queries which can not be executed in the index manager. For
example the path a/*, which returns all direct children of a
node, can not be executed as a SEQL query, as the IM will
not know the names of the child nodes.