Niagara_3 Bugs, Problems, and Misfeatures
- Zigzag join can hang if no matching subtrees?
- Using the shakespeare data as a reference,
a caesar query such as
./caesar -G -B .//PERSONAE -t '.[PERSONA]' ...
works as expected.
However, if the zigzag has a R side that isn't contained in the left,
the system hangs and just sits around.
./caesar -G -B .//PERSONAE -t '.[SPEECH]' ...
I haven't debugged this yet, I noticed it while testing the
new query fragment stuff.
I think this is probably similar to some other problems,
and that the operators need to sink input data until it
is consumed.
Rather than being "done" and leaving an output operator
stuck in flow control.
This really needs bolo style flow control back pressure
streams to do shutdown etc.
- All term IDs in same space
- Term IDs for all term ID types are being allocated
from the same number space.
This is different from the space-for-each that we discussed.
Not a bug, but rather something to watch out for in case the other
behavior is desired.
This is actually better most of the time, since redundant mappings
for the same term in the different dictionaries is not needed
- Parse Events messes up first payload word when spaces insignificant
- When spaces are not significant, if there are spaces between
the end element tag on the enclosing element, and the first payload
word of the element, the 1st word will not be marked
as Adjacent to the enclosing element.
This leads to the DM inserting an extra "adjacency space" before the
first payload word in that element.
- dependencies broken?
- Automake offers no way to rebuild dependencies once they are built?!
This means that if someone adds an include file, or an include files
moves, there is no way to force an update of the dependencies.
- DM spends lots of time in mutexs
- Many of the in-memory data structures in the DM are
protected by a mutex.
This protection is at a very low level, and essentially requires
locking the table even to do a lookup on it.
This actually has considerable overhead, on the order of
performing table lookups themself!
A better solution for this would be to control access at
a higher layer in the DM.
Another possibility would be to add some sort of latch-like
mechanism so repeated access isn't as expensive.
- Need to track term IDs in streams
- I discovered this bug in a thought experiment .. here goes.
If you are running a simple query, such as
//x
or //@x
you always know that the things being returned are x
's.
However, if you run a OR based query, such
as //(x|y)
or //(@x|@y)
then suddenly you DO NOT KNOW
the type and ID of the postings being returned.
Why is this a problem?
Well, say that you then want to lookup those attributes
in the DM.
You need to know the name of those fields -- most importantly
the attribute name -- to be able to find that data in the DM.
To solve this the Posting
info has to be upgraded with the element or attribute
type/ID of the posting so this indeterminate thing can be
located properly.
The same problem may also occur if you are searching for
payload words or attiribute words, since once
you create a catenated list, you loose their type identity.
- Missing documents fail in execution
- If specify a non-existent document in a query
detection of the problem is delayed until query execution.
That is because there is currently no front end.
However, perhaps some sort of "sanity checker" could be
added for the non-execution "query setup" mechanism.
That would allow these checks to be done prior to executing
the graph
- We do not have a simple, exhaustive test suite
- It is very easy to create a bug which isn't detected
for some time in niagara.
This is because, unlike SQL databases, our operators are
highly dependent upon the order and spacing of content
in a document.
This drives up the number of edge cases where errors may be
possible way beyond that seen in relational systems.
We have discussed this quite a bit, but it is something that keeps
on coming back to bite us.
I wonder how may XML Processors out there actually execute
queries correctly?
Heck, I wonder if Niagara executes queries correctly!
- There is no independent catalog layer.
- This is one of the biggest mistakes we did in the redesign
of the system.
The old "v2" system has a document catalog in the IM,
and then the DM used that ... except when you were running DM only
and it simulated it.
The real problem here is that things are tied to tightly to the IM.
Also, the IM and the DM maintain independent mappings of
ELEMENT and ATTRIBUTE names ot IDs.
And, for example, the IM never needs to map IDs to names, so it
doesn't have a mapping for that.
However, see earlier squawk, you must be able to do that to
simple unnest attribute postings back to DM XKeys
Bolo Documentation
Bolo's Home Page
Last Modified:
Mon Jul 26 10:56:58 CDT 2004
Bolo (Josef Burger)
<bolo@cs.wisc.edu>