DEMON--Data Evolution and Monitoring
Data mining algorithms have been the focus of much research recently.
The goal of data mining is to discover (predictive) models based on
the data maintained in the database. Several algorithms have been
proposed for computing novel models, for more efficient model
construction, and to deal with new data types.
However, two important issues have not been considered
previously. First, the issue of measuring the difference, or
deviation , between the interesting characteristics of two
datasets has not been addressed. Second, the fact that most data
warehouses in which data for mining is maintained evolve
``systematically'' has not been exploited. That is, blocks of
tuples are added simultaneously to the database; the systematic block
evolution opens many new mining scenarios which have not been
considered until now.
The DEMON project addresses the above two issues. One component of the
project also addresses the development of fast scalable data mining
algorithms which are then used to derive incremental mining algorithms
for block evolution.
RainForest--A framework for fast decision tree classification of large datasets (VLDB 98)
BOAT--Optimistic Decision Tree Construction (SIGMOD 99)
Bubble and Bubble-FM-Clustering large datasets in arbitrary metric spaces (ICDE 99)
CACTUS--Clustering Categorical Data Using Summaries (SIGKDD'99)
FOCUS--A Framework for Measuring Changes in Data
Characteristics (PODS 99)
DEMON--Data Evolution and Monitoring (To appear in the Proceedings of ICDE 2000)
(Adjudged the best student paper)
Code for the demon project may be found here. (I still have to clean this up and document it. Please revisit
in a month to find an updated version.)