Niagara Bugs, Problems, and Misfeatures

I want to start to maintain a list of bugs as they are found. A lot of bugs can be fixed easily, and usually are as soon as they around found. Other bugs need to be dug into and looked at. If you find any bugs like that, the non-obvious ones, please send email to me so I can add them to this list.

The other thing is, just don't go off to fix bugs that show up here. Instead, lets meet about them and set a priority and importance to things. Some problems will require laying a groundwork for any fixes to be implemented on top of. Some will need real design work that should be participated in by a number of people. Others will need to be steered a bit by someone with more experience.


N3: Parse Events messes up first payload word when spaces insignificant
When spaces are not significant, if there are spaces between the end element tag on the enclosing element, and the first payload word of the element, the 1st word will not be marked as Adjacent to the enclosing element. This leads to the DM inserting an extra "adjacency space" before the first payload word in that element.
The testing environment is rather ad-hoc.
While having testing at all is a wonderful thing, the test harness is sort of ad-hoc and not very well built. A lot of these ad-hoc problems are systemic issues which need to be addressed. Once they are fixed properly some of the ad-hoc-ness will go away. In other ways the system architecture may need to be changed a bit to have a better test setup that more reflects actual system usage.
The parser and query engine should have their own idea of a DATA directory where data can be found.
This is important so that tests can be checked out anywhere and run. The hard-wired /p/niagara/repository/data paths cause problems there. Especially once the data is committed to CVS and will be checked out in people's workspaces. Or located in arbitrary common locations. Or checked out at another site.
qry14.sel.star
Fails on orca. The query apparently runs OK, and process exits OK. However, no result is produced and there is neither a a niagara.results1 or niagara.time1 file with results.
IndexFile is broken
If you run IndexFile twice, it re-indices all the documents instead of noticing that some are loaded. The only reason it worked before was that it always blew away both the IM and DM lib-db databases, and then rebuilt them!
The IM can't index anything already in the DM?
Currently the IM indices documents. Then it throws the document away. Unfortunately, this means that the system has to read a document off the disk or the network twice when actually accessing the document. What we really should look at doing is loading and indexing the document all at once so we only have to read and parse the entire thing one time. It would also allow us to re-index existing documents in the database if done correctly.
dependencies broken?
Automake offers no way to rebuild dependencies once they are built?! This means that if someone adds an include file, or an include files moves, there is no way to force an update of the dependencies.
DM dependent up on IM
The DM needs the IM, because the IM has the document ID generator. This is a historical oddity based on different and seperate databases and no central "control" database was possible. The real solution for this is to create a layer that generates document IDs, that both
There needs to be a non-IM and non-DM database section
This DB portion should contain meta-information about documents. This would allow, for example, a timestamp on index-only documents so we don't re-index documents that are not out of date, and that have already been indexed.
The IM should commit after indexing each document
That way the store should be in some semblance of order
Missing file causes infinite DM looping
If a file to be loaded is missing, the various recursive calls in the DM well recurse without bounds. This needs to be looked at and fixed in a decent manner. There is no reason for possible recursion when loading a file, and if a file is not found, the system should say that and return an error!

Fixed Bugs

Running QETest with the incorrect exit(1); replaced with return 1; at the end of main hangs the QETest program.
When something like this happens, in general, it is an indication that object destructors are not working correctly. Why is it bad to do this? Well, using exit(int) short-circuits global destructors in a process, as well as the destructors in main(). They don't run properly and have a chance to clean up. Obviously, the destructors don't work correctly when they run, and the system hangs.
IndexMgr meta-info
When dealing with marshalling and unmarshalling meta-information to disk, the IndexMgr doesn't keep track of actual memory use. It gets an estimate of how large a write buffer it will need, but then it doesn't track how much data is actually placed in the buffer. This causes UMCs when the garbage portion of the buffer is written. Actual length should be tracked instead of approximate length.
IndexMgr meta-info
Is currently marshalled to ascii and read back in from ascii. This takes a lot of CPU time and is complex, however it does allow for changing data structures without affecting the on-disk representation too much. If it is going to be kept as text a self-describing format should be used, so that it can survive and adapt with changes. Using a binary format would considerably decrease startup/shutdown times as well as get rid of errors caused by the iostream library.
8-bit chars
The Niagara code is using 8 bit chars. However, a lot of the c-library is only seven bit clean. This affects thinsg such as strcasecmp(), the isXXXX() ctype facilities, and other similar things. On the other hand, all the data should be plain ascii, and I wonder where in the world the hi-bits are getting set in plain text. That may be the real problem.
IndexMgr data reload has no error checkin
The iostream code used to read the index manager's meta data back in from the ascii on-disk version has no error checking at all. If something goes wrong variables start not being initialized correctly, random memory is read, and it is just a big mess.

Bolo Documentation
Bolo's Home Page
Last Modified: Mon Feb 17 18:33:48 CST 2003
Bolo (Josef Burger) <bolo@cs.wisc.edu>