Operators for Niagara Introduction With the new execution engine in Niagara the line between Query Engine and Search Engine is blurred consider- ably, perhaps to the point of nonexistence. Streams can contain QE information such as XNodes, SE information such as Postings, as well as inline values. Many of the opera- tions which can be performed upon the data contained are generic database operations which are independent of the type of data being processed. Note, though, that there are operations which make sense only in the context of one type of data, for example contains only works with two postings. Random Notes Outer Variants We have discovered that it is important to be able to perform outer joins (and other outer-operators too) on streams of XML data. This is required both in the query engine and in search engine processing. The need for this arises from the semantics of XQuery and XML, where results may be required from many stages of query processing -- even if no match occurs at that process- ing stage. So far we have identified such things as outer contains, and outer join as useful idioms for query processing. We should be able to support outer joins on any specified number of inputs to an operator. Note that we may want to restrict which operators are capable of doing outer processing to keep implementa- tion and execution costs reasonable. Stream Order Most if not all data produced from the Index Manager and the Data Manager are presented as sorted in docu- ment order. When multiple documents are retrieved from the IM, the documents are presented in document id order. These well known properties can be used to advantage during query processing to allow simpler more efficient algorithms to be used in a great number of cases. These algorithms also produce results in the same order as the inputs, and maintain that property for future operators. However, the most efficient algorithms for some cases may be ones that produce unordered or partially ordered results. When a execu- tion plan decides to use such an operator, the output stream must be marked as unordered. The plan generator will then need to use other algorithms which can -2- operator correctly on unordered streams, or will need to insert a sort operation to reorder the tuples prop- erly to allow the use of order-assuming operators. Of course some operator algorithms may work correctly with either type of stream, and may even preserve the order which was present on the inputs. Another thing to note about order is that it may only apply to some fields of a tuple. For example, dereferencing an IDREF and extracting information from it may add non docu- ment-order fields to a tuple, since the order in which IDs are referenced relative to IDREFs is indeterminate. Group and Aggregate It is useful to perform both grouping and aggregate operations on SE as well as QE data. Note that the Search Engine requirements for grouping are for simple numerical grouping and aggregation as in a relational system, not the more complex structural grouping that is possible at the XML or QE level. Operators in General Select Project Join Order Group Select Select evaluates predicates against the values or sub- trees which are present in the input tuple. The com- plexity of the select operator has been greatly reduced from the current Niagara as it no longer unnests, it just does predicate evaluation. This reduction in com- plexity allows select to be piggy-backed on top of other operators if desired. The predicate that select evaluates may contain various non-predicate functions to perform data conversion as needed for the query evaluation. -3- The above being said, there is one unnest-like thing which must be done That task is comparing the textual representa- tion of document subtrees for predicate evaluation. This is something which XML and XQuery rely on from time to time, and must be executed. This task will be performed by a spe- cial incremental comparison operator which will realize only as much of a subtree as is necessary to evaluate the predi- cate in question. The constructed result from comparison to comparison may be cached and reused (or evaluated to a greater length) to perform additional comparisons with. If the subtree is large enough, it will be necessary to read/write it from disk to allow processing of this query to control memory use of the operator. Adding subtree hash values to the DM will allow filtering equality comparisons of subtrees. In other words, only potential matches must be instantiated for actual compari- son, instead of needing to start guaranteed-fail instantia- tions for all possible matches. Project As is expected, project projects values from the input tuple to the output tuple. One interesting need we have discovered is the need for a predicate-based select. In other words project may evaluate a predi- cate and decide what fields from the input tuple to project to the output tuple based upon the results. Conceptually project can also be used to execute functions on values from input tuples. Join Conceptually join takes a join predicate and an arbi- trary number of inputs and performs the appropriate cartesian product. There can be many join algorithms in Niagara, such as hash join symmetric hash join, nested loops, merge, sort-merge, grace, choose those which are needed. Operations Operations are actions which are performed on individ- ual tuples or data items. They are often the building block used to construct an operator. Operations know nothing about streams, they just know about data, and how to manipu- late it. -4- Search Engine Here is a brief list and explanation of the Search Engine operators which generating execution plans from our query corpus has discovered. These operations can be per- formed on any appropriate Posting field in a tuple. All of these SE operators are expressed as full-blown opera- tors. In execution it may often be possible to plumb the functionality onto an existing operator. There may be limi- tations on how that plumbing may occur, however, so there still may be a need for a full-fledged version of the opera- tors. Contains This has been there all along. The L input is an Ele- ment or an Attribute, something which contain something else. The R input for Elements can be Elements, Pay- load Words, Attributes, or Attribute Words. The R input for Attributes can be Attribute Words. It finds all occurences of a R input which is nested within the structure of the L input. It is possible to negate the condition of a contains opera- tor, which converts it to not-contains. For both normal and not contains, the semantics of containment of null values must be considered; given the variability in null handling between queries, it seems best to leave that decision to the optimizer and allow the operator to produce all variant results. There is an orthogonal component to contain operators; a contain may be executed by any join. Choice of join algo- rithm is possible based upon input streams, expected data, statistics, and other such criteria. An appropriate con- tains expression, in the guise of a predicate, can be sup- plied and performed. With this methodology, contains is nothing but an ordinary join, and not a dedicated operator. Direct-Contains Again, this operators is already present. The inputs are the same as the contains operator. It finds all occurences of a R input which is directly nested within the structure of the L input. Only-Contains This is a variant of Direct-Contains. I believe it's current name in the system is is, however it really is a form of contains. Inputs are the same as contains. It returns all occurences of a R input being the only thing directly nested within the L input. Exists-Contains This is another variant of Contains, though it can be a -5- variant on any contains variant. The above contains operators return a complete list of matching pairs. However, there can be cases where that information is not needed by the rest of the query; it qualifies the node for other purposes. One example of this is unnest for other values. This singleton variant provides the L input if the qualifying condition is true. As in all the other operators, the R input may be projected if needed. Note that Exists-contains is the behavior of the current contains code, as it qualifies a node for just the first instance of a match, allowing unnest to do the bulk of the work. In the new IM and SE, con- tains does much of the work which unnest previously performed. Range It is often necessary to extract an arbitrary range of matches on a input stream. Ordering of the stream is not the responsibility of the range operator, it is repsonsible for selecting a subset of whatever ordering the stream has. For example range use, all combina- tions of the first match, or the first several matches, or a particular match, or a range of matches, or all matches after N are possibilities. Rank, Top and Range can all be represented by a two-argument range expres- sion which takes a begin and end specification of a range. Begin values include start-of-document (other- wise known as rank 1), and end values include end-of- document. When end-of-document is specified, the start argument may be negative to specify distance from the end. Translations of common requests are shown here: +----------------------------------+ |Top(5) Range(Start, 5) | |Rank(3) Range(3,3) | |Between(10,15) Range(10,15) | |Between(9, End) Range(9, End) | |Last(5) Range(-5, End) | +----------------------------------+ The last form need not be implemented, I mention it to be complete. It is the only form which requires the operator to store an arbitrary, but known in advance, number of tuples before it generates its result. As such it is also, of necessity, a blocking operator. Range is trivial to implement since it has nothing to do with ordering. When performing multi-document queries it is necessary to combine range with grouping or aggregation operators to provide per-document results in the typical case. As long as the input stream obeys the assumed docu- ment_id-order model this is trivial to implement in range itself. If an non-document_id-order stream is used, it probably best to use the grouping operator and provide rank -6- as an aggregation predicate to it. Note that we may need to upgrade the contents of the IDREF table to provide type information of destination nodes to allow for proper indi- rection based queries, as the destination may be an arbi- trary element. Deref The drefer operator is used to provide access to the refid to id table. In other words it is a probe-join used to provide dreference information for IDREF infor- mation. It is innapropriate to use a normal join in this case as IDREFs do not have any order in the requesting document. Information returned by deref is not in sorted order. IDREFs do not reference IDs in document order. If an order is desired on that field it must be applied via an operator. Ref-By The ref-by operator is something we are considering for inclusion to aid queries which require such informa- tion. It finds IDREFs which correspond to an ID. One implementation of this operator is to perform a full join, instead of a probe join, of the input against the IDREF index. Another possible implementation is to provide both forward (IDREFs->ID) and backward (ID->IDREFs) indices on the IDREF table. Most notes about deref apply equally to refby. Is it possible to use a multi-dimensional index for such mapping? Reinject Experimental The intent of the reinject operator is to allow a query stream to re-enter the SE after leaving it. I thought I had good reason for its existence, but perhaps it is too complex to implement. It seems that it is necessary for executing correlated sub-queries. For example, the reinject operator contains a simple QE expression, into which it can insert values or Postings from the input tuple. The simple SE query may then be executed with those parameters, the results output, and the next tuple processed. If the expression is complex it is better to try to integrate it into the overall query plan. However some plans, with the right statis- tics, could use this to great advantage to reduce the work needed for some queries. We weed to find a good exaple of why this is important, to decide if it should be tossed or not. Merge The merge operator takes two streams of postings and merges them. If the streams are in document and docu- ment_id order the output will have the same order. For example, using an earlier query: -7- (A contains B) or (A contains C) One plan is to query A contains (B merge C) It may be desireable for merge to project an additional field into the output which reflects which input stream the merged value originated from. This would be used for later select predicates to discriminate between different matches. Except The except operator outputs all Postings from the L input, except those which exist in the R input. I believe this operator may also be known as difference. Project For all intents and purposes, we can consider all the SE operators to always add more fields to the tuples. However, some of those values will no longer be needed for further query processing and should be eliminated. The project operator takes a single input stream and projects desired fields from that stream to the output stream. It is interesting to observe that project can be considered a special case of a generic sorted merge operator with project capabilities .. a single input merge. Query Engine The following are the Query Engine operators we have found a need for while creating execution plans for our query corpus. As noted in the Search Engine section, it may be possible to plumb together operators directly for execu- tion, so that individual operators are not required at run- time. The purpose of each operators is mentioned by itself, as it stands on its own and can be optimized the same way. For example, an operator which adds fields to a tuple has an implicit project functionality built into it. Simple-Unnest To one point of view, Unnest is the translator between the SE stream and the QE stream. To another point of view, unnest just adds values to a tuple. In the cur- rent Niagara, unnest always performs an exhaustive search, using the DM, to unnest the path expression in question. This can be an expensive operation. How- ever, the new Niagara can reduce the amount of work required by unnest, as it provides the exact ele- ment_ids of elements to be unnested. With this infor- mation unnest need only extract the required values from the document and add them to the tuple. If a expression to be unnested is found to be large, or to -8- be a XML subtree, the value is not added to the tuple. Instead a flag is placed in the extracted value field of the tuple which indicates that the value is large and that an expensive evaluation must be performed upon demand. Note: This implies that the XNode correspond- ing to that value is always kept in the tuple by the execution plan generator. If the expected atomic value is a small subtree, it can be converted to a textual representation and used for further queries, if it falls below some size threshold. Note that the con- trolling XNode will always be used for node identity, as unnested values are just that -- anonymous values. If something to be unnested is not found, there may be either an error or the value treated as a null value. Such decision should be made by the plan generator, not the runtime. The runtime will either generate an error, or put a flag in the value slot indicating that the value is null. Unnest This version of unnest is the full-blown expensive unnest that is currently available in the system. There are some cases where results can not be supplied from the SE to allow use of Simple-unnest; in that case the exhaustive search of the document in the DM must be performed. Other than that detail, Unnest does every- thing which is mentiond for simple-unnest. Merge The merge operator takes multiple streams of tuples and merges them together. It is be possible to specify predicates of how the tuples should be merged, though I don't know how useful that may ultimately be. Contruct Construct is a family of operators which are directly related to the algebra. Given input tuples new docu- ments or document fragments are constructed from the values available in the tuple. Filter-Construct The filter-construct operator is a special construction operator which implements the filter semantics of XQuery. The input to filter-construct is a stream of candidate Postings or XNodes. A state stack is main- tained to connect the XNodes together, using their start and end numbers to determine the closest match document structure as mentioned in the XQuery paper. The resulting document may be a a single cohesive docu- ment or an ordered forrest. Rank See the rank operator in the Search Engine section for details. -9- Sort Yup ... Sorting .. we all know what it is. 'Nuff said. Group-By Strictly speaking, group-by in XQuery and XML means structural aggregation. Due to that necessary evil, the grouping subsystem must be capable of grouping both a large number of groups as well as a large amount of data in individual groups. This is considerably dif- ferent from standard relational grouping which only requires a large group count and storage for the aggre- gate result and temporary space used to develop that result. Note this only discusses grouping and aggregation at a con- ceptual level; see the vignette on grouping and aggregation for the details. Conceptually the group-by operator takes a stream as an input and has an known or potentially unknown number of streams as an output. Each of the output streams corresponds to a unique grouping value. Given this, group- by is a simple switch which routes packets to an appropriate output stream. It has nothing to do with aggregation, stor- age, or any other issues. To extract the output of this conceptual Aggregation Aggregate operations can be used both with the group-by operator and as a standalone operator. With the above mentioned group-by operator, aggregate handling is, conceptually performed by simple, stream based, aggre- gate operators for either case. Almost anything can be implemented as an aggregate function, and it will be possible to perform multiple aggregate operations as well as filter (aka select operator) aggregates which are processed. I'll mention some aggregate operators here as an example, but again, the framework allows most anything to be implemented. See the aggregate vignette for details.