Query Engine Streams


Current Implementation

     The current version of Niagara uses streams to link the
operators in a query.  The individual  tuples  on  a  stream
consist  of  a  number  of  fields,  the context of which is
determined by the query  optimizer  and  operator  behavior.
The  fields  of  the  tuple contain a XNode, which is essen-
tially a document-id,element-id pair that is used to address
a  element  of  an XML document.  Operators use the XNode to
retrieve properties or values of that  node.   Those  values
are then used for query processing.

     In a typical query a SEQL query run in the select oper-
ator will generate a set of postings.  The  select  operator
then converts those postings into XNodes.  The operator then
outputs a stream of those canidate XNodes.

     When the stream arrives at an unnest operator, it  will
unnest one of the N columns of each tuple and generate a new
stream with tuples that have  N+1  fields.   The  additional
field  will  be  an XNode which corresponds to whatever sub-
element was being unnested.  Unnest  may  have  more  output
tuples  than  input  tuples, since its exhaustive search may
locate more than one candidate XNode.

     When operators evaluate predicates, the XNodes  in  the
expression  are  used  to  access  the  underlying document.
These accesses consist of extracting random values or  prop-
erties  of  the node in question such that the predicate can
be evaluated.

Problems

     One problem with this strategy is that  accesses  to  a
document  are  essentially  random as query processing runs.
As such, it is difficult to generate good disk  access  pat-
terns,  and  I/O  goes slowly, which also slows query execu-
tion.  In addition, the same property or value may  need  to
be  evaluated  many times during execution of a query.  Each
time the value is accessed another disk access needs  to  be
made.

     The  unnest  operator must exhaustively search subtrees
of a candidate XNode to unnest it.  This can be an expensive
operation,  even  if  the  subtree is shallow, as a directly
contains unnest may encounter a large number of children.











                             -2-


Inlining Values in Tuples

     In the new Niagara, Query Engine Tuples will consist of
both  XNodes and of inline values.  When a document is exam-
ined by unnest, unnest will be able to project out the vari-
ous  properties or other values of the XNode into the tuple.
This is similar to who how stream-based  relational  systems
evaluate queries, and we will achieve the same kind of bene-
fit as those systems do.  This improves disk  I/O  patterns,
as  the same node will be examined only once during a query.
There may be another access during  result  construction  as
well.  Note, of course, the optimizer may decide to generate
more accesses depending upon it's  statistics  for  reducing
the  workload  as  much as possible.  This is similar to the
earlier trade-off mentioned for result construction.

     When an operator evaluates a predicate, it will use the
value  that  is  inlined in the tuple.  Of course, the opti-
mizer will adjust it's projection lists for each operator to
get rid of values as they are no longer needed.

     Note  that  an  unnest  may refer to an entire document
subtree, which can be quite large, instead of a value  which
is  typically  small.  Or, for example, the value itself may
be quite large, for example an entire textual document.   In
these  cases  the  value is not inlined.  Instead a flag  or
other indicator will be left that tells the expression eval-
uator  that  this  value  is  not a value.  When an operator
needs to access a value, it  will  either  use  the  inlined
value,  or find that the value is absent.  When the value is
absente, the evaluator will need to goto disk to retrieve as
much  of  that value as is needed to determine the result of
the predicate.  If the value is an XML subtree, it will need
to  be  read  and  converted to text to be compared against.
However, that conversion need only continue until the predi-
cate  can be satisfied.  If a value was expected, this means
that only one node will need to be accessed  before  a  mis-
match  can  be  determined.   However,  if  the  user issues
queries that will require subtree comparison,  query  execu-
tion will slow down as that occurs.  Such is life.

Unnest

     With  the  redesign  of  the  SEQL portion of the query
engine, the role of unnest in the  new  system  will  change
quite a bit.  As mentioned above, unnest must currently make
an exhaustive  sub-document  search  to  generate  candidate
unnested  nodes.   With  the  new  SEQL  engine,  the  query
returned by SEQL portion of query execution  may  be  exact,
providing  all  the info needed by further Query Engine pro-
cessing, instead of just a root that needs  to  be,  perhaps
exhaustively, searched.  In that case, unnest will only need
to convert the search engine stream information  to  XNodes,
and also retrieve an inline any values which are needed.









                             -3-


     However,  the current unnest functionality will also be
required.  Exhaustive searches will still  be  required  for
queries which can not be executed in the index manager.  For
example the path a/*, which returns all direct children of a
node,  can  not  be executed as a SEQL query, as the IM will
not know the names of the child nodes.