Operators for Niagara
Introduction
With the new execution engine in Niagara the line
between Query Engine and Search Engine is blurred consider-
ably, perhaps to the point of nonexistence. Streams can
contain QE information such as XNodes, SE information such
as Postings, as well as inline values. Many of the opera-
tions which can be performed upon the data contained are
generic database operations which are independent of the
type of data being processed. Note, though, that there are
operations which make sense only in the context of one type
of data, for example contains only works with two postings.
Random Notes
Outer Variants
We have discovered that it is important to be able to
perform outer joins (and other outer-operators too) on
streams of XML data. This is required both in the
query engine and in search engine processing. The need
for this arises from the semantics of XQuery and XML,
where results may be required from many stages of query
processing -- even if no match occurs at that process-
ing stage. So far we have identified such things as
outer contains, and outer join as useful idioms for
query processing. We should be able to support outer
joins on any specified number of inputs to an operator.
Note that we may want to restrict which operators are
capable of doing outer processing to keep implementa-
tion and execution costs reasonable.
Stream Order
Most if not all data produced from the Index Manager
and the Data Manager are presented as sorted in docu-
ment order. When multiple documents are retrieved from
the IM, the documents are presented in document id
order. These well known properties can be used to
advantage during query processing to allow simpler more
efficient algorithms to be used in a great number of
cases. These algorithms also produce results in the
same order as the inputs, and maintain that property
for future operators. However, the most efficient
algorithms for some cases may be ones that produce
unordered or partially ordered results. When a execu-
tion plan decides to use such an operator, the output
stream must be marked as unordered. The plan generator
will then need to use other algorithms which can
-2-
operator correctly on unordered streams, or will need
to insert a sort operation to reorder the tuples prop-
erly to allow the use of order-assuming operators. Of
course some operator algorithms may work correctly with
either type of stream, and may even preserve the order
which was present on the inputs.
Another thing to note about order is that it may only apply
to some fields of a tuple. For example, dereferencing an
IDREF and extracting information from it may add non docu-
ment-order fields to a tuple, since the order in which IDs
are referenced relative to IDREFs is indeterminate.
Group and Aggregate
It is useful to perform both grouping and aggregate
operations on SE as well as QE data. Note that the
Search Engine requirements for grouping are for simple
numerical grouping and aggregation as in a relational
system, not the more complex structural grouping that
is possible at the XML or QE level.
Operators in General
Select
Project
Join
Order
Group
Select
Select evaluates predicates against the values or sub-
trees which are present in the input tuple. The com-
plexity of the select operator has been greatly reduced
from the current Niagara as it no longer unnests, it
just does predicate evaluation. This reduction in com-
plexity allows select to be piggy-backed on top of
other operators if desired. The predicate that select
evaluates may contain various non-predicate functions
to perform data conversion as needed for the query
evaluation.
-3-
The above being said, there is one unnest-like thing which
must be done That task is comparing the textual representa-
tion of document subtrees for predicate evaluation. This is
something which XML and XQuery rely on from time to time,
and must be executed. This task will be performed by a spe-
cial incremental comparison operator which will realize only
as much of a subtree as is necessary to evaluate the predi-
cate in question. The constructed result from comparison to
comparison may be cached and reused (or evaluated to a
greater length) to perform additional comparisons with. If
the subtree is large enough, it will be necessary to
read/write it from disk to allow processing of this query to
control memory use of the operator.
Adding subtree hash values to the DM will allow filtering
equality comparisons of subtrees. In other words, only
potential matches must be instantiated for actual compari-
son, instead of needing to start guaranteed-fail instantia-
tions for all possible matches.
Project
As is expected, project projects values from the input
tuple to the output tuple. One interesting need we
have discovered is the need for a predicate-based
select. In other words project may evaluate a predi-
cate and decide what fields from the input tuple to
project to the output tuple based upon the results.
Conceptually project can also be used to execute functions
on values from input tuples.
Join Conceptually join takes a join predicate and an arbi-
trary number of inputs and performs the appropriate
cartesian product. There can be many join algorithms
in Niagara, such as hash join symmetric hash join,
nested loops, merge, sort-merge, grace, choose those
which are needed.
Operations
Operations are actions which are performed on individ-
ual tuples or data items. They are often the building block
used to construct an operator. Operations know nothing
about streams, they just know about data, and how to manipu-
late it.
-4-
Search Engine
Here is a brief list and explanation of the Search
Engine operators which generating execution plans from our
query corpus has discovered. These operations can be per-
formed on any appropriate Posting field in a tuple.
All of these SE operators are expressed as full-blown opera-
tors. In execution it may often be possible to plumb the
functionality onto an existing operator. There may be limi-
tations on how that plumbing may occur, however, so there
still may be a need for a full-fledged version of the opera-
tors.
Contains
This has been there all along. The L input is an Ele-
ment or an Attribute, something which contain something
else. The R input for Elements can be Elements, Pay-
load Words, Attributes, or Attribute Words. The R
input for Attributes can be Attribute Words. It finds
all occurences of a R input which is nested within the
structure of the L input.
It is possible to negate the condition of a contains opera-
tor, which converts it to not-contains. For both normal and
not contains, the semantics of containment of null values
must be considered; given the variability in null handling
between queries, it seems best to leave that decision to the
optimizer and allow the operator to produce all variant
results.
There is an orthogonal component to contain operators; a
contain may be executed by any join. Choice of join algo-
rithm is possible based upon input streams, expected data,
statistics, and other such criteria. An appropriate con-
tains expression, in the guise of a predicate, can be sup-
plied and performed. With this methodology, contains is
nothing but an ordinary join, and not a dedicated operator.
Direct-Contains
Again, this operators is already present. The inputs
are the same as the contains operator. It finds all
occurences of a R input which is directly nested within
the structure of the L input.
Only-Contains
This is a variant of Direct-Contains. I believe it's
current name in the system is is, however it really is
a form of contains. Inputs are the same as contains.
It returns all occurences of a R input being the only
thing directly nested within the L input.
Exists-Contains
This is another variant of Contains, though it can be a
-5-
variant on any contains variant. The above contains
operators return a complete list of matching pairs.
However, there can be cases where that information is
not needed by the rest of the query; it qualifies the
node for other purposes. One example of this is unnest
for other values. This singleton variant provides the
L input if the qualifying condition is true. As in all
the other operators, the R input may be projected if
needed. Note that Exists-contains is the behavior of
the current contains code, as it qualifies a node for
just the first instance of a match, allowing unnest to
do the bulk of the work. In the new IM and SE, con-
tains does much of the work which unnest previously
performed.
Range
It is often necessary to extract an arbitrary range of
matches on a input stream. Ordering of the stream is
not the responsibility of the range operator, it is
repsonsible for selecting a subset of whatever ordering
the stream has. For example range use, all combina-
tions of the first match, or the first several matches,
or a particular match, or a range of matches, or all
matches after N are possibilities. Rank, Top and Range
can all be represented by a two-argument range expres-
sion which takes a begin and end specification of a
range. Begin values include start-of-document (other-
wise known as rank 1), and end values include end-of-
document. When end-of-document is specified, the start
argument may be negative to specify distance from the
end. Translations of common requests are shown here:
+----------------------------------+
|Top(5) Range(Start, 5) |
|Rank(3) Range(3,3) |
|Between(10,15) Range(10,15) |
|Between(9, End) Range(9, End) |
|Last(5) Range(-5, End) |
+----------------------------------+
The last form need not be implemented, I mention it to be
complete. It is the only form which requires the operator
to store an arbitrary, but known in advance, number of
tuples before it generates its result. As such it is also,
of necessity, a blocking operator.
Range is trivial to implement since it has nothing to do
with ordering. When performing multi-document queries it is
necessary to combine range with grouping or aggregation
operators to provide per-document results in the typical
case. As long as the input stream obeys the assumed docu-
ment_id-order model this is trivial to implement in range
itself. If an non-document_id-order stream is used, it
probably best to use the grouping operator and provide rank
-6-
as an aggregation predicate to it. Note that we may need to
upgrade the contents of the IDREF table to provide type
information of destination nodes to allow for proper indi-
rection based queries, as the destination may be an arbi-
trary element.
Deref
The drefer operator is used to provide access to the
refid to id table. In other words it is a probe-join
used to provide dreference information for IDREF infor-
mation. It is innapropriate to use a normal join in
this case as IDREFs do not have any order in the
requesting document. Information returned by deref is
not in sorted order. IDREFs do not reference IDs in
document order. If an order is desired on that field
it must be applied via an operator.
Ref-By
The ref-by operator is something we are considering for
inclusion to aid queries which require such informa-
tion. It finds IDREFs which correspond to an ID. One
implementation of this operator is to perform a full
join, instead of a probe join, of the input against the
IDREF index. Another possible implementation is to
provide both forward (IDREFs->ID) and backward
(ID->IDREFs) indices on the IDREF table. Most notes
about deref apply equally to refby.
Is it possible to use a multi-dimensional index for such
mapping?
Reinject
Experimental The intent of the reinject operator is to
allow a query stream to re-enter the SE after leaving
it. I thought I had good reason for its existence, but
perhaps it is too complex to implement. It seems that
it is necessary for executing correlated sub-queries.
For example, the reinject operator contains a simple QE
expression, into which it can insert values or Postings
from the input tuple. The simple SE query may then be
executed with those parameters, the results output, and
the next tuple processed. If the expression is complex
it is better to try to integrate it into the overall
query plan. However some plans, with the right statis-
tics, could use this to great advantage to reduce the
work needed for some queries. We weed to find a good
exaple of why this is important, to decide if it should
be tossed or not.
Merge
The merge operator takes two streams of postings and
merges them. If the streams are in document and docu-
ment_id order the output will have the same order. For
example, using an earlier query:
-7-
(A contains B) or (A contains C)
One plan is to query
A contains (B merge C)
It may be desireable for merge to project an additional
field into the output which reflects which input stream
the merged value originated from. This would be used
for later select predicates to discriminate between
different matches.
Except
The except operator outputs all Postings from the L
input, except those which exist in the R input. I
believe this operator may also be known as difference.
Project
For all intents and purposes, we can consider all the
SE operators to always add more fields to the tuples.
However, some of those values will no longer be needed
for further query processing and should be eliminated.
The project operator takes a single input stream and
projects desired fields from that stream to the output
stream. It is interesting to observe that project can
be considered a special case of a generic sorted merge
operator with project capabilities .. a single input
merge.
Query Engine
The following are the Query Engine operators we have
found a need for while creating execution plans for our
query corpus. As noted in the Search Engine section, it may
be possible to plumb together operators directly for execu-
tion, so that individual operators are not required at run-
time. The purpose of each operators is mentioned by itself,
as it stands on its own and can be optimized the same way.
For example, an operator which adds fields to a tuple has an
implicit project functionality built into it.
Simple-Unnest
To one point of view, Unnest is the translator between
the SE stream and the QE stream. To another point of
view, unnest just adds values to a tuple. In the cur-
rent Niagara, unnest always performs an exhaustive
search, using the DM, to unnest the path expression in
question. This can be an expensive operation. How-
ever, the new Niagara can reduce the amount of work
required by unnest, as it provides the exact ele-
ment_ids of elements to be unnested. With this infor-
mation unnest need only extract the required values
from the document and add them to the tuple. If a
expression to be unnested is found to be large, or to
-8-
be a XML subtree, the value is not added to the tuple.
Instead a flag is placed in the extracted value field
of the tuple which indicates that the value is large
and that an expensive evaluation must be performed upon
demand. Note: This implies that the XNode correspond-
ing to that value is always kept in the tuple by the
execution plan generator. If the expected atomic value
is a small subtree, it can be converted to a textual
representation and used for further queries, if it
falls below some size threshold. Note that the con-
trolling XNode will always be used for node identity,
as unnested values are just that -- anonymous values.
If something to be unnested is not found, there may be
either an error or the value treated as a null value.
Such decision should be made by the plan generator, not
the runtime. The runtime will either generate an
error, or put a flag in the value slot indicating that
the value is null.
Unnest
This version of unnest is the full-blown expensive
unnest that is currently available in the system.
There are some cases where results can not be supplied
from the SE to allow use of Simple-unnest; in that case
the exhaustive search of the document in the DM must be
performed. Other than that detail, Unnest does every-
thing which is mentiond for simple-unnest.
Merge
The merge operator takes multiple streams of tuples and
merges them together. It is be possible to specify
predicates of how the tuples should be merged, though I
don't know how useful that may ultimately be.
Contruct
Construct is a family of operators which are directly
related to the algebra. Given input tuples new docu-
ments or document fragments are constructed from the
values available in the tuple.
Filter-Construct
The filter-construct operator is a special construction
operator which implements the filter semantics of
XQuery. The input to filter-construct is a stream of
candidate Postings or XNodes. A state stack is main-
tained to connect the XNodes together, using their
start and end numbers to determine the closest match
document structure as mentioned in the XQuery paper.
The resulting document may be a a single cohesive docu-
ment or an ordered forrest.
Rank See the rank operator in the Search Engine section for
details.
-9-
Sort Yup ... Sorting .. we all know what it is. 'Nuff
said.
Group-By
Strictly speaking, group-by in XQuery and XML means
structural aggregation. Due to that necessary evil,
the grouping subsystem must be capable of grouping both
a large number of groups as well as a large amount of
data in individual groups. This is considerably dif-
ferent from standard relational grouping which only
requires a large group count and storage for the aggre-
gate result and temporary space used to develop that
result.
Note this only discusses grouping and aggregation at a con-
ceptual level; see the vignette on grouping and aggregation
for the details. Conceptually the group-by operator takes a
stream as an input and has an known or potentially unknown
number of streams as an output. Each of the output streams
corresponds to a unique grouping value. Given this, group-
by is a simple switch which routes packets to an appropriate
output stream. It has nothing to do with aggregation, stor-
age, or any other issues.
To extract the output of this conceptual
Aggregation
Aggregate operations can be used both with the group-by
operator and as a standalone operator. With the above
mentioned group-by operator, aggregate handling is,
conceptually performed by simple, stream based, aggre-
gate operators for either case. Almost anything can be
implemented as an aggregate function, and it will be
possible to perform multiple aggregate operations as
well as filter (aka select operator) aggregates which
are processed. I'll mention some aggregate operators
here as an example, but again, the framework allows
most anything to be implemented. See the aggregate
vignette for details.