New Index Manager Structure


Background

     The current Index Manager consists of a number of Lexi-
cons.  A Lexicon exists for each type of Posting, or  index,
that the IM maintains.  The type of content indexed by indi-
vidual  Lexicons  are  Text,  Elements,   Element+Attribute,
IDrefs,  DTDs,  and  multi-attribute  results.  Each Lexicon
logically contains a set of  LexiconEntries.   Each  Lexicon
Entry  corresponds  to  a term (word, element, etc) which is
stored in that particular Lexicon.  Each  Lexicon  Entry  is
identified  by  the term (word) which that entry represents.
The Lexicon Entry contains meta information about the  post-
ings  for  its term, and the postings themselves.  The Post-
ings contained in the Lexicon Entries can conceptually vary,
based  on  the  information  which needs to be stored in the
postings.  A superset  of  all  contained  information  that
exists is:

      +----------------------------------------------+
      |Posting                                       |
      |document_id   element_id   start   end   nest |
      +----------------------------------------------+
Words,  for  example,  don't utilize end information --  and
don't store that information.

     The Lexicons and their entries are stored  on  disk  in
BTree files.  There is a table which contains IM meta-infor-
mation.  This table includes meta information for each Lexi-
con.   This  meta-information  is read at system startup and
written at shutdown.  Each Lexicon is stored on  disk  as  a
BTree  file.   The  key to the BTree is the term that corre-
sponds to the Lexicon Entrys of  that  Lexicon.   The  value
portion of the BTree entry contains some Lexicon Entry meta-
information, followed by the list of  postings  that  record
the occurence of that term.

     In addition to the simple indexes, such as word or ele-
ment, the IM is also capable of storing indices with  multi-
attribute  (MA)  results.  These multi-attribute indices are
used for path indices.  The posting  for  a  multi-attribute
index contains multiple postings, hence the name.  For exam-
ple, given a path of a/b, the  posting  for  the  MA  result
would consist of two element postings.  The first posting of
the set would be the a locations,  and  the  second  posting
corresponds to the b locations.

     In  the  current  system there are large lookup tables,
the Lexicon, where key and meta information about each  term









                             -2-


is  memory  resident  for  the duration of the session.  The
per-term information  is  the  LexiconEntry.   The  postings
associated  with each term, however, are not memory resident
-- they are rolled in from disk when they are  accessed  and
referenced to by the individual LexiconEntrys.  This caching
requires that postings be read out of the  buffer  pool  and
unmarshalled  into an in-memory format.  This in-memory for-
mat is then copied to a user buffer and used for query  pro-
cessing.  The postings in the LexiconEntry have some sort of
replacement policy (or they don't yet, which is part of  the
problem)  to  get  recover  the space they consume when more
space is required.

     In addition this in-memory buffer  is  used  to  buffer
postings  written to the IM that are scheduled to be written
to the underlying disk.  When enough  postings  are  accumu-
lated, they are marshalled to the on disk format and written
to disk.  Note that sorting is needed somewhere in  here  to
ensure the correct order of the on-disk data.  The two forms
of data (on disk and in memory)  require  expensive  copying
and data conversion operations.

     All  postings  for  all  documents  of a given term are
stored in a large BLOB.  Due to this,  it  is  necessary  to
read  all proceeding data in the BLOB until the actual data,
such as a document or subtree of  a  document,  is  located.
These  BLOBs can also grow quite large and manipulating them
is expensive both in disk I/O and cpu time.

New System

     The new system's intent, is to simplify how IM data  is
handled,  allow  access to  specific data quickly, eliminate
in-memory copies of the data, and make the  IM  be  scalable
with respect to the amount of memory it uses.

     Data  conversion  will be eliminated by supporting dif-
ferent posting formats which are  known  by  the  system  --
instead  of  using a uniform posting format and shoe-horning
all the postings into it.  Multi-Attribute postings  already
violated  the uniform posting format, and it requires expen-
sive data conversion to use.

     The complex (and not implemented) cache management will
be  eliminated by eliminating the general-purpose cache, and
replacing it with something else.  The replacment cache is a
performance-enhancment  only  which will buffer written data
to minimize format changes and disk I/Os -- the system would
work  without  it.   It is not used in the context of normal
query processing;

     Indexing the data via (term-id, doc-id) allows  immedi-
ate access to the data in a particular document for in-docu-
ment queries.  This storage change also alleviates the large









                             -3-


BLOB  problem  mentioned  earlier,  by  breaking up existing
BLOBs into smaller ones.  The ability to use multiple BTrees
to store the postings also allows the data to be split among
devices, which allows better disk access  and  storage  uti-
lization.   Ultimately  we  may partition data into multiple
BTrees in a different fashion, rather than the  all  entries
for  a  term  in  a  BTree  format assumed here.  That work,
though, requires that even simple queries be capable of par-
allel  execution.  Though intended in the growth of the sys-
tem, that is not something we will be requiring immediately.