Temporary  XML  representation  and storage in the
                    new Niagara system.


Problems

     Currently there is one non-scalable behavior  with  the
Niagara  Data  Manager;  it  requires  a document to be com-
pletely parsed into memory before it is stored on disk.  The
problem  is, of course, that the document may not fit in the
memory resources available.

     Another thing that has become an issue is that  Niagara
should  be  able  to issue queries on documents that are not
completely parsed.  This has two implications; the first  is
that  this  will reduce latency of query execution and allow
for partial results.  The flip side of this problem is  that
the  entire  query execution may be thrown away if the docu-
ment does not parse correctly However that is the nature  of
the beast with XML; you can't determine that a query is cor-
rect until the document parses successfully.

     Niagara has no good way  of  producing  query  results.
Currently results are created as a in-memory DOM tree.  How-
ever there is no way to store the resulting  query  data  to
disk  in  the  IM  and the DM.  Query results are limited by
available memory.

     Yet another issue is that Niagara may want to parse and
manipulate  documents  that  are not stored in the Data Man-
ager.  Indeed, we may want to run Niagara without  any  Data
Manager at all!  This means that we will need some mechanism
for refering only to parsed documents in memory.  An example
of  this case is running Niagara as an XML grep tool on ran-
dom XML files.  However, if we do have a DM, perhaps  we  do
want  to  store  the document for future access so the parse
cost need only be paid for once.

     Niagara should be capable of querying  streams  of  XML
data.  However there is no end of the stream to stop parsing
the data so it can be used.   The  stream  may  need  to  be
stored,  or thrown away after the query is executed.  I men-
tion this last since, once the above concerns are mentioned,
streams  seem  to  fall  out as a particular instance of the
above problems.

Current Implementation

     The Data Manager is given the  URL  of  a  document  to
load.   It  locks the URL so that only one thread will parse
the document.   It  interacts  with  the  Index  Manager  to









                             -2-


determine  the  document  ID  of  the document.  The DM then
starts its own parser and parses the document  into  memory,
into  a  set of per-document tables called maps.  One set of
maps exist per parsed document; they are not  shared  as  is
the  on-disk  representation.   Once parsing is complete the
content of the in-memory maps are appended  to  the  on-disk
BTree  which  holds  the Data Manager database.  As the keys
are (doc-id, element-id)  pairs,  there  is  no  interaction
between  multiple documents in the on-disk BTree map, except
for increasing the size of the database.  The in-memory maps
for  that  document are then deleted.  Once this is done the
document is unlocked.  At this point the thread that  loaded
the  document,  as  well as any other threads requesting the
document can access it.

     An XNode is the in-memory class that contains the  han-
dle  to  an  element  of a document.  Conceptually it is the
(doc-id, element-id) pair mentioned above.  When an  element
of an XML document is accessed, the referenced node is found
via examining the on-disk BTree maps, and the desired  prop-
erties are extracted for in-memory use.

     Note that if a document is required by two queries, the
2nd query to request a particular document will pause  until
the document is loaded by the first query.

Solution

     The  proposed  solution to these problems are to intro-
duce the concept of temporary XNodes into the Data  Manager.
Users  of  XNodes  will not see any difference between these
temporary XNodes and normal on-disk XNodes; everything  will
be an XNode.

     When  a  document  is parsed to be stored in the DM, it
will be parsed into temporary XNodes, which are then flushed
to  disk.  If a result is generated, it will be generated as
temporary XNodes, which may then be discarded or flushed  to
disk as desired.  If a document that is parsed for the dura-
tion of a query is encountered, it will be parsed into  tem-
porary XNodes which will be removed when the query execution
is complete.

     Note, however, that temporary XNodes may be flushed  to
disk  as needed to control the memory resources that Niagara
uses.  As all XNodes are XNodes, there will be no user visi-
ble  change  to  this transition.  Things will just work and
performance will degrade gracefully  instead  of  hitting  a
stone wall.

Proposed Implementation

     Temporary  XNodes  will  be implemented as in in-memory
Object Cache addressed by the (document-id,element-id) pair.









                             -3-


Whenever  an  XNode is accessed a lookup will be done in the
Object Cache to determine if the XNode in question is stored
there, or on disk.

     As a performance step, pre-filtering on document-id may
be done to reduce the overhead of determining if an XNode is
in  the  cache.  In that case, a document would be specified
as being either all on disk, all in memory, or both in  mem-
ory and disk.  As everything is done on a per document basis
this seems a natural way of handling this issue.

     When a document is  parsed  there  is,  essentially,  a
stack  of  elements  which are incomplete and will remain so
until their end tag is  parsed  successfully.   Fortunately,
this  limits  the  in-cache footprint for a parsed document.
Flushing incomplete nodes will cause a performance  degreda-
tion,  as  the  on-disk  node will need to be updated.  When
that end tag is parsed that element is completed, I/O can be
scheduled  for the completed element and all nested elements
of that document.  Note that even if elements of a  document
are  forced to disk that performance will still be OK; large
documents will force everything to become I/O bound at  some
point.  Smaller documents will not have that problem.

Other implementation Details

     One  thing  the above section says nothing about is the
problem of querying partially parsed documents.  To  a  cer-
tain  extent,  that  is an orthogonal problem to some of the
above problems.  Implementing the above  allows  parsing  of
large  results,  storing  results, and handling ad-hoc docu-
ments.  Indeed, querying partially parsed documents may  not
be  strictly  necessary, as the tuple based execution scheme
may meet most needs of query execution,  relying  only  upon
the complete document to produce results.

     To  handle  querying and references to incomplete docu-
ments, a flag needs to be added to the  XNode,  or  to  some
lookup table for incomplete documents.  The flag would indi-
cate if an XNode is complete or not, and perhaps some  state
on  how  complete  it  is.  The states of completion are the
element itself is being parsed, the element is complete  and
some  subset of descendants has been parsed, and the element
end tag has been found.  A simple implementation would block
execution  of  a  thread  trying to use an incomplete XNode.
Once the XNode is complete, the blocked  thread  could  con-
tinue  and  will  be  allowed  to access the content of that
node.  However this simple case may not  be  useful  enough,
since  access  to  the root itself will be blocked until the
document is complete.  However, as mentioned above,  it  may
work  well with the tuple based execution scheme, which will
only refer to the content of a  node  if  it  is  requested.
This  mode  is what would be used for payload-only elements,
unless some sort of  locking  is  done  as  the  payload  is









                             -4-


streamed  into  the individual node, which requires far more
complexity.

     With additional sophistication, one  can  refer  to  an
incomplete  node.   Once  a descendant node has been parsed,
the scanner thread can descend into that  node  and  examine
it.   In  that  case, as a thread tried to access subsequent
descendants of a node, it would keep on doing so and descend
the  document  tree.   The blocking action would happen in a
fashion that snakes along, resembling the order in which the
document is parsed.

     We  discussed  system-wide  versus  per-session  object
caches.  The advantage of the per-session  object  cache  is
that  there  is  no  need for locking.  The advantage of the
system-wide cache is that you can have  sharing.   There  is
nothing  in  this scheme which prevents either or both kinds
of caches from existing at the same time.   The  issue  with
the  system  wide  cache is that database lock locking would
need to take effect and we would have to be  concerned  with
multiple  accessors.  However, even in a single query we may
have some of those problems, for example multiple SEQL oper-
ators generating references to parts of the same document in
different parts of the execution tree.

     Note that references to documents and sub-documents are
performed  on  a per-document basis, and this scheme doesn't
try to address garbage collection of unused subtrees of doc-
uments.   Once  a temporary document is no longer referenced
by a query, then it may be deleted from the object cache.  A
different  strategy  could  be used, but that is good enough
for now, and the users of the system will not see  any  dif-
ference in use, just in performance, although something will
need to create the appropriate references for  garbage  col-
lection.

     In  combination  with  the above info on accessing par-
tially parsed documents comes the  capability  for  multiple
queries  to  share the common parse of a document -- through
the exact same mechansim  However,  until  further  work  is
done,  this  will only work while the head of the conceptual
document stream is all located in the object cache.  So, for
example,  two  queries  accessing the same document that are
started in close to each other may both be able to  use  the
single  parse of the document in low-latency mode to process
their queries.  Indeed, the cache manager could be  notified
of  this  situation  and do its best to keep those resources
available.  However, once data leaves the cache and is writ-
ten to disk, then other queries wanting to access that docu-
ment must block until the transaction that the  load  is  in
has completed.  This is due to database locks which will not
allow access to the on-disk data.  This is  not  seen  as  a
real  problem at the present, since the current scheme fixes
many existing problems and provides for some possibility  of









                             -5-


simultaneous  query  execution on a document, which the cur-
rent architecture does not.  Work can be done to investigat-
ing  the isolation of document loads into their own transac-
tion, as we have talked about for the existing DM.   It  may
be  possible to access such a document in the context of the
load transaction, which can access the on-disk data, to keep
things flowing.

     One  element that will be needed is an efficient way of
determining if a node is parsed or not.   This  method  must
avoid  race  conditions, have minimal storage impact, and be
space and time efficient.  Fortunately, such a  problem  can
be  solved in the context of Niagara.  It is based upon ele-
ment IDs in a document which is being parsed.  As a document
is  parsed from a linear form to a tree-like representation,
nodes are parsed to completion from the bottom up.  As  such
there  will always be a range or ranges of element IDS which
have been completely  parsed,  starting  at  the  shallowest
leftmost  leaf.  As parsing continues, anytime the head of a
parse completes an element, an additional range may be added
to  the  list of parsed ranges.  At the same time this range
can be trivially coallesced with adjacent  ranges.   When  a
thread  desires  to  access a node of a incomplete document,
the list of completed ranges must be checked  for  a  match.
If  it  is not found, then it blocks, awaiting completion of
that element.  Whenever  a  completion  occurs,the  list  of
waiters can be examined, and any satisifed waiters released.
There will typically be few waiters for incomplete nodes  of
a  document, on the order of the number of threads trying to
access this document.  Race conditions will not be a problem
due to the range based nature of this locking, as any missed
waiters can be trivially observed courtesy of the range com-
pletion.  With exact matches, for example,  thee chance of a
missed rendezvous is possible.

     Other methods of synchronization, such as a  incomplete
bit  and  a  wait list may also be used.  The wait list need
only be examined if it is non-empty.  Due to  the  frequency
of parse completions and the low number of waiters, an exact
solution is not necessary.  Atomic memory updates  could  be
used  to  advantage  to minimize the amount of lock and work
required to check the waiters list through other range based
techniques.   You  can spend more resources for a more exact
(available just after it is parsed)  result, or less  for  a
correct but slower to be notified result.

     As  mentioned in the implementation section, cache man-
agement is an important part of  this  proposal.   It  seems
that  individual  simplistic  strategies  can do a good job,
better than what we are doing  now.   A  mixture  of  simple
strategies, may not take a lot of time to improve performane
and still allow flexibility.











                             -6-


Requests on Partial XNodes

     As mentiond earlier, the simplest strategy to use  when
requesting  information  from  incomplete XNodes is to block
until the XNode is complete.  If the  requested  information
is  not  yet complete, that wait must occur.  However, there
are certain types of requests which can  be  satisfied  cor-
rectly  with  an  incomplete Xnode.  This requires more com-
plexity.

     An element consists of a  start  tag  with  attributes,
some  inline  payload,  a  set  of  children (which can also
include payload), and a end tag.  As one example of  a  par-
tial  query,  once  the  start tag is completely parsed, any
requests for attributes of that node may be completely  sat-
isfied, even though the node itself isn't complete until the
end tag is parsed.  Instead of having  a  single  incomplete
flag  for an XNode, we can introduce two incomplete flags to
record that state: One flag indicates that  parsing  of  the
start  tag  of the XNode is incomplete, the other flag indi-
cates that parsing of the element  (the  end  tag  has  been
parsed)  is incomplete.  If either flag is set the node as a
whole is incomplete, but once both are cleared the  node  is
complete and can satisfy any request.

     It is possible to allow finer grained access to Partial
XNodes.  However, such access comes at cost -- which is  the
need for more complex synchronization to the XNode.  This is
not a big problem, I just mention it up-front to  make  cer-
tain  that  it is understood.  You gain something, the capa-
bility to query incomplete nodes, but it does  have  a  cost
associated  with it.  Note that I believe that they best way
to take care of this syncrhonization is to ensure that  tem-
porary  XNodes which are incomplete must be resident in mem-
ory while a request is pending.  This allows the syncroniza-
tion  mechanism  to be part of the object cache, reduces the
overhead of stored data, and is at least a  starting  point.
Note  that  database locks are too coarse and not applicable
for the kind of locking we need to implement this.

     The essence of further access to partial Xnodes is that
we  must  classify the information request as to its nature.
Some requests can be satisfied from  an  incomplete  element
without  violating  document  properties.   Access  to named
attributes or children which are  themselves  complete;  the
information  is  either  present and immediately returnable.
If the name is not present (or is  itself  incomplete),  the
query must block until it is.  For example ...
     A   request  to  fetch  a  named attribute, such as @id
     could fetch the id attribute if it exists and  is  com-
     plete.   Similarily a request for a named element, such
     as /foo, can be satisfied if the element is  found  and
     it is complete.
Other  examples  of  this  case  are  elements referenced by









                             -7-


position; either they are there and can be returned, or they
are  not and you need to wait until the node is complete (or
complete enough) to return that information.

Other  requests,  that  request  arbitrary  information,  or
information which can not be known until the element is com-
plete, must block until it is.  A few examples:

     The number of attributes is not known until the element
     start tag is parsed.

     The  number  of children is not known until the end tag
     is parsed.

     Requests to iterate through attributes or children  can
     not complete until all of the same are complete.
With  these  types  of request, the request must block until
the element, or the relevant portion of the element, is com-
plete.

Other Solution(s)

     One  techinique  discussed  was to use a DOM parser and
representation for ad-hoc parsing.  This would work only for
streams  and  ad-hoc  document queries.  It does nothing for
the parsing problem or generating query results.   The  base
problem  with  this  technique is that we need two in-memory
representations of XML data; the XNodes, and the  DOM  node.
There  are  more  components  to maintain, multiple parsers,
multiple  in-memory  representations,  and  other  duplicate
effort.  In addition, DOM parsers parse to completion before
returning the resulting document, which  preclude  any  pro-
cessing  while  parsing.  Another shortcoming is that if the
data set must fit into  memory,  and  there  is  no  way  to
"spill"  it  to  disk  if the footprint becomes unexpectedly
large.