DEMON--Data Evolution and Monitoring

Overview

Data mining algorithms have been the focus of much research recently. The goal of data mining is to discover (predictive) models based on the data maintained in the database. Several algorithms have been proposed for computing novel models, for more efficient model construction, and to deal with new data types. However, two important issues have not been considered previously. First, the issue of measuring the difference, or deviation , between the interesting characteristics of two datasets has not been addressed. Second, the fact that most data warehouses in which data for mining is maintained evolve ``systematically'' has not been exploited. That is, blocks of tuples are added simultaneously to the database; the systematic block evolution opens many new mining scenarios which have not been considered until now. The DEMON project addresses the above two issues. One component of the project also addresses the development of fast scalable data mining algorithms which are then used to derive incremental mining algorithms for block evolution.

Project Members

Raghu Ramakrishnan
Venkatesh Ganti
Johannes Gehrke

Publications

Classification

RainForest--A framework for fast decision tree classification of large datasets (VLDB 98)
BOAT--Optimistic Decision Tree Construction (SIGMOD 99)

Clustering

Bubble and Bubble-FM-Clustering large datasets in arbitrary metric spaces (ICDE 99)
CACTUS--Clustering Categorical Data Using Summaries (SIGKDD'99)

Deviations

FOCUS--A Framework for Measuring Changes in Data Characteristics (PODS 99)

Block Evolution

DEMON--Data Evolution and Monitoring (To appear in the Proceedings of ICDE 2000)
(Adjudged the best student paper)


Code for the demon project may be found here. (I still have to clean this up and document it. Please revisit in a month to find an updated version.)