Niagara_3 Bugs, Problems, and Misfeatures

Zigzag join can hang if no matching subtrees?: Using the shakespeare data as a reference, a caesar query such as ./caesar -G -B .//PERSONAE -t '.[PERSONA]' ... works as expected. However, if the zigzag has a R side that isn't contained in the left, the system hangs and just sits around. ./caesar -G -B .//PERSONAE -t '.[SPEECH]' ... I haven't debugged this yet, I noticed it while testing the new query fragment stuff. I think this is probably similar to some other problems, and that the operators need to sink input data until it is consumed. Rather than being "done" and leaving an output operator stuck in flow control. This really needs bolo style flow control back pressure streams to do shutdown etc.
All term IDs in same space: Term IDs for all term ID types are being allocated from the same number space. This is different from the space-for-each that we discussed. Not a bug, but rather something to watch out for in case the other behavior is desired. This is actually better most of the time, since redundant mappings for the same term in the different dictionaries is not needed
Parse Events messes up first payload word when spaces insignificant: When spaces are not significant, if there are spaces between the end element tag on the enclosing element, and the first payload word of the element, the 1st word will not be marked as Adjacent to the enclosing element. This leads to the DM inserting an extra "adjacency space" before the first payload word in that element.
dependencies broken?: Automake offers no way to rebuild dependencies once they are built?! This means that if someone adds an include file, or an include files moves, there is no way to force an update of the dependencies.
DM spends lots of time in mutexs: Many of the in-memory data structures in the DM are protected by a mutex. This protection is at a very low level, and essentially requires locking the table even to do a lookup on it. This actually has considerable overhead, on the order of performing table lookups themself! A better solution for this would be to control access at a higher layer in the DM. Another possibility would be to add some sort of latch-like mechanism so repeated access isn't as expensive.
Need to track term IDs in streams: I discovered this bug in a thought experiment .. here goes. If you are running a simple query, such as //x or //@x you always know that the things being returned are x's. However, if you run a OR based query, such as //(x|y) or //(@x|@y) then suddenly you DO NOT KNOW the type and ID of the postings being returned. Why is this a problem? Well, say that you then want to lookup those attributes in the DM. You need to know the name of those fields -- most importantly the attribute name -- to be able to find that data in the DM. To solve this the Posting info has to be upgraded with the element or attribute type/ID of the posting so this indeterminate thing can be located properly. The same problem may also occur if you are searching for payload words or attiribute words, since once you create a catenated list, you loose their type identity.
Missing documents fail in execution: If specify a non-existent document in a query detection of the problem is delayed until query execution. That is because there is currently no front end. However, perhaps some sort of "sanity checker" could be added for the non-execution "query setup" mechanism. That would allow these checks to be done prior to executing the graph
We do not have a simple, exhaustive test suite: It is very easy to create a bug which isn't detected for some time in niagara. This is because, unlike SQL databases, our operators are highly dependent upon the order and spacing of content in a document. This drives up the number of edge cases where errors may be possible way beyond that seen in relational systems. We have discussed this quite a bit, but it is something that keeps on coming back to bite us. I wonder how may XML Processors out there actually execute queries correctly? Heck, I wonder if Niagara executes queries correctly!
There is no independent catalog layer.: This is one of the biggest mistakes we did in the redesign of the system. The old "v2" system has a document catalog in the IM, and then the DM used that ... except when you were running DM only and it simulated it. The real problem here is that things are tied to tightly to the IM. Also, the IM and the DM maintain independent mappings of ELEMENT and ATTRIBUTE names ot IDs. And, for example, the IM never needs to map IDs to names, so it doesn't have a mapping for that. However, see earlier squawk, you must be able to do that to simple unnest attribute postings back to DM XKeys

Bolo Documentation
Bolo's Home Page

Last Modified: Mon Jul 26 10:56:58 CDT 2004

Bolo (Josef Burger) <bolo@cs.wisc.edu>