Multi Attribute Indices Current Niagara also has the concept of a multi-attribute index, A multi-attribute index is an index which has a com- plex key, such as a path expression. A path expression such as a/b has two components, a and b. The multi aspect of this index comes from the existence of multiple postings, one corresponding to each element of the path expression. However, they can be used as general purpose join indices as long as a mapping between a join condition and the multi- attribute result may be generated. Instead of a term being a key to the index, the path expres- sion, essentially a cached query result, is the key to the index. These indices are derived indices, which can be rebuilt from the document, or from the normally indexed por- tion of a document. Currently there are two types of multi- attribute indices in Niagara. The first type are the implicit MA indices used for the cur- rent idref and element+attribute indices. These indices are built in-memory when a document is indexed, and then written to disk. As such, they can only be built when the document is indexed, and the information to build them is lost after- ward. The second type of multi-attribute indices are those built explicitly to record path expressions. For example a/b/c is a multi-attribute path consisting of 3 path elements, and there would be a resulting multi-attribute posting consist- ing of the locations of matching 'a/b/c' postings. Goals and Problems Multi-attribute indices are a powerful tool in Niagara. They allow the computation of expensive results to occur once and then to be stored -- but the result can be used again and again. However, their management is currently rather ad-hoc. As mentioned earlier, some multi-attribute indices are implicit -- they are only built at document index time, as a side-effect of indexing. Note that it is currently not possible to generate these implicit indices after the fact, through a query, since the information needed to generate the index is lost after the document is indexed. Multiple attribute indices have a key which conceptu- ally consists of a number of independent fields, from which -2- the results are derived. The result of the multi-attribute index can also have multiple fields corresponding to those elements in the key. Note, however, that it may not be desirable or possible to save all of the results generated by the query. For example, the element+attribute index cur- rently in use has two key values, but only one result value -- the element the attribute is stored in. As another exam- ple, say a index is generated for the path expression a/b/c. It may only be desirable to save 1 "column" of results, c from this query -- if that is all that is used in actual queries. With other path indices it may be desireable to save only some other subset of the results .. maybe a, c because B is never used, it is just cared about in the structure that generated the result. However, some queries will undoubtedly want to keep all the information and the full a, b, c path. There is also a problem of interpreting the multi- attribute results, as the type of the multi-attribute result is not known. In addition to path elements it could consist of a combination of other elements. Solutions To generalize the use of multi-attribute indices in Niagara, we'll create catalog information for each multi- attribute index. At a high level, the catalog will contain information about what data is available in the index, which fields from the path are stored. The other info it will contain is the type of each posting stored. For example, say an index was on the path a/b/c. If all the information was stored the catalog would say it each posting consists of three sub-postings, and would have the size of the posting. The catalog catalog will also note that each of those sub-postings contained information about an element, aka type information. If we only wanted to know about the c paths in the above, the catalog would say that there is only one piece of information in the posting. That info corre- sponds to the last element of the path statement. It's type is reference-to-element. As arbitrary path expressions or query expressions may be used as the key of a multiple-attribute index, it is possible that the index may contain different types of postings in its multiple attributes. For example, we could generate an index on the expression as a/b/c con- tains "xyzzy". The posting could consist of 4 fields, consisting of 3 elements postinsg and one payload text posting. As another example, try a/b[@="xyz"]/c. This could contain two elements and an attribute, or three elements and an attribute, or some other combination of -3- result postings. The catalog eliminates implicit knowledge of what multi-attribute indices contain. It replaces the implicit knowledge with explicit information about what is contained. Explicit knowledge is flexible and changeable and it elimi- nates lots of problems in the process. The system is no longer inhibited by the implicit knowledge, as anything can be done by putting info about it in the catalog. Implicit multi-attribute indices will go away, to be replaced by generic multi-attribute indices for the same data. Implementation At this point in time we do not have implementation details for this portion of the system. Providing this gen- eral concept early in the game will guide the develeopment of the system enough to allow us to handle those problems when we get to them. In the interim we can move the implicit knowledge of multi-attribute queries outside of the IM layer, and let that knowledge reside in the SEQL opera- tors. Or, we could provide an interface that makes every- thing look right, but not allow for anything but path matches result queries until later. There are also questions about having the optimizer generate hints about what multi-attribute indices would be useful. Note that the catalog info about the multi-attribute indices is similar to the catalog info needed for the multi- attribute SEQL streams which will replace the current sin- gle-attribute SEQL streams.