New Index Manager Structure Background The current Index Manager consists of a number of Lexi- cons. A Lexicon exists for each type of Posting, or index, that the IM maintains. The type of content indexed by indi- vidual Lexicons are Text, Elements, Element+Attribute, IDrefs, DTDs, and multi-attribute results. Each Lexicon logically contains a set of LexiconEntries. Each Lexicon Entry corresponds to a term (word, element, etc) which is stored in that particular Lexicon. Each Lexicon Entry is identified by the term (word) which that entry represents. The Lexicon Entry contains meta information about the post- ings for its term, and the postings themselves. The Post- ings contained in the Lexicon Entries can conceptually vary, based on the information which needs to be stored in the postings. A superset of all contained information that exists is: +----------------------------------------------+ |Posting | |document_id element_id start end nest | +----------------------------------------------+ Words, for example, don't utilize end information -- and don't store that information. The Lexicons and their entries are stored on disk in BTree files. There is a table which contains IM meta-infor- mation. This table includes meta information for each Lexi- con. This meta-information is read at system startup and written at shutdown. Each Lexicon is stored on disk as a BTree file. The key to the BTree is the term that corre- sponds to the Lexicon Entrys of that Lexicon. The value portion of the BTree entry contains some Lexicon Entry meta- information, followed by the list of postings that record the occurence of that term. In addition to the simple indexes, such as word or ele- ment, the IM is also capable of storing indices with multi- attribute (MA) results. These multi-attribute indices are used for path indices. The posting for a multi-attribute index contains multiple postings, hence the name. For exam- ple, given a path of a/b, the posting for the MA result would consist of two element postings. The first posting of the set would be the a locations, and the second posting corresponds to the b locations. In the current system there are large lookup tables, the Lexicon, where key and meta information about each term -2- is memory resident for the duration of the session. The per-term information is the LexiconEntry. The postings associated with each term, however, are not memory resident -- they are rolled in from disk when they are accessed and referenced to by the individual LexiconEntrys. This caching requires that postings be read out of the buffer pool and unmarshalled into an in-memory format. This in-memory for- mat is then copied to a user buffer and used for query pro- cessing. The postings in the LexiconEntry have some sort of replacement policy (or they don't yet, which is part of the problem) to get recover the space they consume when more space is required. In addition this in-memory buffer is used to buffer postings written to the IM that are scheduled to be written to the underlying disk. When enough postings are accumu- lated, they are marshalled to the on disk format and written to disk. Note that sorting is needed somewhere in here to ensure the correct order of the on-disk data. The two forms of data (on disk and in memory) require expensive copying and data conversion operations. All postings for all documents of a given term are stored in a large BLOB. Due to this, it is necessary to read all proceeding data in the BLOB until the actual data, such as a document or subtree of a document, is located. These BLOBs can also grow quite large and manipulating them is expensive both in disk I/O and cpu time. New System The new system's intent, is to simplify how IM data is handled, allow access to specific data quickly, eliminate in-memory copies of the data, and make the IM be scalable with respect to the amount of memory it uses. Data conversion will be eliminated by supporting dif- ferent posting formats which are known by the system -- instead of using a uniform posting format and shoe-horning all the postings into it. Multi-Attribute postings already violated the uniform posting format, and it requires expen- sive data conversion to use. The complex (and not implemented) cache management will be eliminated by eliminating the general-purpose cache, and replacing it with something else. The replacment cache is a performance-enhancment only which will buffer written data to minimize format changes and disk I/Os -- the system would work without it. It is not used in the context of normal query processing; Indexing the data via (term-id, doc-id) allows immedi- ate access to the data in a particular document for in-docu- ment queries. This storage change also alleviates the large -3- BLOB problem mentioned earlier, by breaking up existing BLOBs into smaller ones. The ability to use multiple BTrees to store the postings also allows the data to be split among devices, which allows better disk access and storage uti- lization. Ultimately we may partition data into multiple BTrees in a different fashion, rather than the all entries for a term in a BTree format assumed here. That work, though, requires that even simple queries be capable of par- allel execution. Though intended in the growth of the sys- tem, that is not something we will be requiring immediately.