R-Tree Validation Example

Input Data

The input data:

consists of three separate data sets, one for each level of the tree.
was produced by dumping the R-Tree after it was bulk loaded (the data in the tree is Tiger geographic data from Orange County, California).
contains the attributes: page ID, level, x, y, width, height, and color. (Color is used to make it easier to distinguish the various nodes at levels 2 and 3.)

The level 1 input file looks like: 1234 1 484402.500000 763222.500000 968413.000000 284401.000000 35 1234 1 1671075.500000 682544.000000 789569.000000 484602.000000 35 ...

Problem

We want to determine whether the R-Tree bulk loading algorithm has produced a "good" R-Tree (not just one that is correct, but one that is well-structured and will result in efficient searches of the data).

Visualization

Click here to see our model of creating visualizations.

The visualization is created as follows:

Define the 'RTree Global Picture' window. This window simply displays the level 1 data.
Define the 'RTree Levels 1-2' window, consisting of two piled views, one of the level 1 data and one of the level 2 data. One of these views is the source for the cursor (the small yellow box) displayed in the 'RTree Global Picture' window. This means that the area displayed in the entire 'RTree Levels 1-2' window corresponds to the area within the cursor in the 'RTree Global Picture' window.
Define the 'RTree Levels 2-3' window. This again consists of two piled views, this time for levels 2 and 3 of the tree (level 3 is the actual objects inserted into the tree). The views in this window are linked to the 'RTree Levels 1-2' window, so that the region displayed in the two windows is the same.

Observations

This visualization was used to find some subtle bugs in the R-Tree bulk loading code. These bugs would have otherwise been difficult to find, because they did not produce incorrect results; rather, they represented a suboptimal R-Tree structure that would cause inefficient searches of the data.

The data as it is shown represent the R-Tree after the bugs were fixed. When the bugs still existed, it was possible to find "bad" areas of the R-Tree by examining the visualization.

For more information on R-Trees, see the paper Using Constraints to Query R*-Trees.

Back to DEVise home page.