Query Engine Streams Current Implementation The current version of Niagara uses streams to link the operators in a query. The individual tuples on a stream consist of a number of fields, the context of which is determined by the query optimizer and operator behavior. The fields of the tuple contain a XNode, which is essen- tially a document-id,element-id pair that is used to address a element of an XML document. Operators use the XNode to retrieve properties or values of that node. Those values are then used for query processing. In a typical query a SEQL query run in the select oper- ator will generate a set of postings. The select operator then converts those postings into XNodes. The operator then outputs a stream of those canidate XNodes. When the stream arrives at an unnest operator, it will unnest one of the N columns of each tuple and generate a new stream with tuples that have N+1 fields. The additional field will be an XNode which corresponds to whatever sub- element was being unnested. Unnest may have more output tuples than input tuples, since its exhaustive search may locate more than one candidate XNode. When operators evaluate predicates, the XNodes in the expression are used to access the underlying document. These accesses consist of extracting random values or prop- erties of the node in question such that the predicate can be evaluated. Problems One problem with this strategy is that accesses to a document are essentially random as query processing runs. As such, it is difficult to generate good disk access pat- terns, and I/O goes slowly, which also slows query execu- tion. In addition, the same property or value may need to be evaluated many times during execution of a query. Each time the value is accessed another disk access needs to be made. The unnest operator must exhaustively search subtrees of a candidate XNode to unnest it. This can be an expensive operation, even if the subtree is shallow, as a directly contains unnest may encounter a large number of children. -2- Inlining Values in Tuples In the new Niagara, Query Engine Tuples will consist of both XNodes and of inline values. When a document is exam- ined by unnest, unnest will be able to project out the vari- ous properties or other values of the XNode into the tuple. This is similar to who how stream-based relational systems evaluate queries, and we will achieve the same kind of bene- fit as those systems do. This improves disk I/O patterns, as the same node will be examined only once during a query. There may be another access during result construction as well. Note, of course, the optimizer may decide to generate more accesses depending upon it's statistics for reducing the workload as much as possible. This is similar to the earlier trade-off mentioned for result construction. When an operator evaluates a predicate, it will use the value that is inlined in the tuple. Of course, the opti- mizer will adjust it's projection lists for each operator to get rid of values as they are no longer needed. Note that an unnest may refer to an entire document subtree, which can be quite large, instead of a value which is typically small. Or, for example, the value itself may be quite large, for example an entire textual document. In these cases the value is not inlined. Instead a flag or other indicator will be left that tells the expression eval- uator that this value is not a value. When an operator needs to access a value, it will either use the inlined value, or find that the value is absent. When the value is absente, the evaluator will need to goto disk to retrieve as much of that value as is needed to determine the result of the predicate. If the value is an XML subtree, it will need to be read and converted to text to be compared against. However, that conversion need only continue until the predi- cate can be satisfied. If a value was expected, this means that only one node will need to be accessed before a mis- match can be determined. However, if the user issues queries that will require subtree comparison, query execu- tion will slow down as that occurs. Such is life. Unnest With the redesign of the SEQL portion of the query engine, the role of unnest in the new system will change quite a bit. As mentioned above, unnest must currently make an exhaustive sub-document search to generate candidate unnested nodes. With the new SEQL engine, the query returned by SEQL portion of query execution may be exact, providing all the info needed by further Query Engine pro- cessing, instead of just a root that needs to be, perhaps exhaustively, searched. In that case, unnest will only need to convert the search engine stream information to XNodes, and also retrieve an inline any values which are needed. -3- However, the current unnest functionality will also be required. Exhaustive searches will still be required for queries which can not be executed in the index manager. For example the path a/*, which returns all direct children of a node, can not be executed as a SEQL query, as the IM will not know the names of the child nodes.