This book constitutes the refereed proceedings of the 8th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2006, held in conjunction with DEXA 2006. The book presents 53 revised full papers, organized in topical sections on ETL processing, materialized view, multidimensional design, OLAP and multidimensional model, cubes processing, data warehouse applications, mining techniques, frequent itemsets, mining data streams, ontology-based mining, clustering, advanced mining techniques, association rules, miscellaneous applications, and classification.
Product dimensions: 1.25 (w) x 6.14 (h) x 9.21 (d)
Table of Contents
ETL Processing
ETLDiff: A Semi-automatic Framework for Regression Test of ETL Software Christian Thomsen Torben Bach Pedersen 1
Applying Transformations to Model Driven Data Warehouses Jose-Norberto Mazon Jesus Pardillo Juan Trujillo 13
Bulk Loading a Linear Hash File Davood Rafiei Cheng Hu 23
Materialized View
Dynamic View Selection for OLAP Michael Lawrence Andrew Rau-Chaplin 33
Preview: Optimizing View Materialization Cost in Spatial Data Warehouses Songmei Yu Vijayalakshmi Atluri Nabil Adam 45
Preprocessing for Fast Refreshing Materialized Views in DB2 Wugang Xu Calisto Zuzarte Dimitri Theodoratos Wenbin Ma 55
Multidimensional Design
A Multiversion-Based Multidimensional Model Franck Ravat Olivier Teste Gilles Zurfluh 65
Towards Multidimensional Requirement Design Estella Annoni Franck Ravat Olivier Teste Gilles Zurfluh 75
Multidimensional Design by Examples Oscar Romero Alberto Abello 85
OLAP and Multidimensional Model
Extending Visual OLAP for Handling IrregularDimensional Hierarchies Svetlana Mansmann Marc H. Scholl 95
A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes Alfredo Cuzzocrea Domenico Sacca Paolo Serafino 106
Analysing Multi-dimensional Data Across Autonomous Data Warehouses Stefan Berger Michael Schrefl 120
What Time Is It in the Data Warehouse? Stefano Rizzi Matteo Golfarelli 134
Cubes Processing
Computing Iceberg Quotient Cubes with Bounding Xiuzhen Zhang Pauline Lienhua Chou Kotagiri Ramamohanarao 145
An Effective Algorithm to Extract Dense Sub-cubes from a Large Sparse Cube Seok-Lyong Lee 155
On the Computation of Maximal-Correlated Cuboids Cells Ronnie Alves Orlando Belo 165
Data Warehouse Applications
Warehousing Dynamic XML Documents Laura Irina Rusu Wenny Rahayu David Taniar 175
Integrating Different Grain Levels in a Medical Data Warehouse Federation Marko Banek A Min Tjoa Nevena Stolba 185
A Versioning Management Model for Ontology-Based Data Warehouses Dung Nguyen Xuan Ladjel Bellatreche Guy Pierra 195
Data Warehouses in Grids with High QoS Rogerio Luis de Carvalho Costa Pedro Furtado 207
Mining Techniques (1)
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy Blaszczynski Krzysztof Dembczynski Wojciech Kotlowski Mariusz Pawlowski 218
Efficient Mining of Dissociation Rules Mikolaj Morzy 288
Optimized Rule Mining Through a Unified Framework for Interestingness Measures Celine Hebert Bruno Cremilleux 238
An Information-Theoretic Framework for Process Structure and Data Mining Antonio D. Chiaravalloti Gianluigi Greco Antonella Guzzo Luigi Pontieri 248
Mining Techniques (2)
Mixed Decision Trees: An Evolutionary Approach Marek Kretowski Marek Grzes 260
ITER: An Algorithm for Predictive Regression Rule Extraction Johan Huysmans Bart Baesens Jan Vanthienen 270
Cobra: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach Kuo-Yu Huang Chia-Hui Chang Jiun-Hung Tung Cheng-Tao Ho 280
Frequent Itemsets
A Greedy Approach to Concurrent Processing of Frequent Itemset Queries Pawel Boinski Marek Wojciechowski Maciej Zakrzewicz 292
Two New Techniques for Hiding Sensitive Itemsets and Their Empirical Evaluation Ahmed HajYasien Vladimir Estivill-Castro 302
EStream: Online Mining of Frequent Sets with Precise Error Guarantee Xuan Hong Dang Wee-Keong Ng Kok-Leong Ong 312
Mining Data Streams
Granularity Adaptive Density Estimation and on Demand Clustering of Concept-Drifting Data Streams Weiheng Zhu Jian Pei Jian Yin Yihuang Xie 322
Classification of Hidden Network Streams Matthew Gebski Alex Penev Raymond K. Wong 332
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang Wee-Keong Ng Kok-Leong Ong 342
An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams Jia-Ling Koh Shu-Ning Shin 352
Ontology-Based Mining
Learning Classifiers from Distributed, Ontology-Extended Data Sources Doina Caragea Jun Zhang Jyotishman Pathak Vasant Honavar 363
A Coherent Biomedical Literature Clustering and Summarization Approach Through Ontology-Enriched Graphical Representations Illhoi Yoo Xiaohua Hu Il-Yeol Song 374
Automatic Extraction for Creating a Lexical Repository of Abbreviations in the Biomedical Literature Min Song Il-Yeol Song Ki Jung Lee 384
Clustering
Priority-Based k-Anonymity Accomplished by Weighted Generalisation Structures Konrad Stark Johann Eder Kurt Zatloukal 394
Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures Jiuyong Li Raymond Chi-Wing Wong Ada Wai-Chee Fu Jian Pei 405
Calculation of Density-Based Clustering Parameters Supported with Distributed Processing Marcin Gorawski Rafal Malczok 417
Cluster-Based Sampling Approaches to Imbalanced Data Distributions Show-Jane Yen Yue-Shi Lee 427
Advanced Mining Techniques
Efficient Mining of Large Maximal Bicliques Guimei Liu Kelvin S.H Sim Jinyan Li 437
Automatic Image Annotation by Mining the Web Zhiguo Gong Qian Liu Jingbai Zhang 449
Privacy Preserving Spatio-temporal Clustering on Horizontally Partitioned Data Ali Inan Yucel Saygin 459
Association Rules
Discovering Semantic Sibling Associations from Web Documents with XTREEM-SP Marko Brunzel Myra Spiliopoulou 469
Difference Detection Between Two Contrast Sets Hui-jing Huang Yongsong Qin Xiaofeng Zhu Jilian Zhang Shichao Zhang 481
EGEA: A New Hybrid Approach Towards Extracting Reduced Generic Association Rule Set (Application to AML Blood Cancer Therapy) M.A. Esseghir G. Gasmi Sadok Ben Yahia Y. Slimani 491
Miscellaneous Applications
AISS: An Index for Non-timestamped Set Subsequence Queries Witold Andrzejewski Tadeusz Morzy 503
A Method for Feature Selection on Microarray Data Using Support Vector Machine Xiao Bing Huang Jian Tang 513
Providing Persistence for Sensor Data Streams by Remote WAL Hideyuki Kawashima Michita Imai Yuichiro Anzai 524
Classification
Support Vector Machine Approach for Fast Classification Keivan Kianmehr Reda Alhajj 534
Document Representations for Classification of Short Web-Page Descriptions Milos Radovanovic Mirjana Ivanovic 544
GARC: A New Associative Classification Approach Ines Bouzouita Samir Elloumi Sadok Ben Yahia 554
Conceptual Modeling for Classification Mining in Data Warehouses Jose Zubcoff Juan Trujillo 566
Author Index 577
More About This Textbook
Overview
This book constitutes the refereed proceedings of the 8th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2006, held in conjunction with DEXA 2006. The book presents 53 revised full papers, organized in topical sections on ETL processing, materialized view, multidimensional design, OLAP and multidimensional model, cubes processing, data warehouse applications, mining techniques, frequent itemsets, mining data streams, ontology-based mining, clustering, advanced mining techniques, association rules, miscellaneous applications, and classification.
Product Details
Table of Contents
ETL Processing
ETLDiff: A Semi-automatic Framework for Regression Test of ETL Software Christian Thomsen Torben Bach Pedersen 1
Applying Transformations to Model Driven Data Warehouses Jose-Norberto Mazon Jesus Pardillo Juan Trujillo 13
Bulk Loading a Linear Hash File Davood Rafiei Cheng Hu 23
Materialized View
Dynamic View Selection for OLAP Michael Lawrence Andrew Rau-Chaplin 33
Preview: Optimizing View Materialization Cost in Spatial Data Warehouses Songmei Yu Vijayalakshmi Atluri Nabil Adam 45
Preprocessing for Fast Refreshing Materialized Views in DB2 Wugang Xu Calisto Zuzarte Dimitri Theodoratos Wenbin Ma 55
Multidimensional Design
A Multiversion-Based Multidimensional Model Franck Ravat Olivier Teste Gilles Zurfluh 65
Towards Multidimensional Requirement Design Estella Annoni Franck Ravat Olivier Teste Gilles Zurfluh 75
Multidimensional Design by Examples Oscar Romero Alberto Abello 85
OLAP and Multidimensional Model
Extending Visual OLAP for Handling IrregularDimensional Hierarchies Svetlana Mansmann Marc H. Scholl 95
A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes Alfredo Cuzzocrea Domenico Sacca Paolo Serafino 106
Analysing Multi-dimensional Data Across Autonomous Data Warehouses Stefan Berger Michael Schrefl 120
What Time Is It in the Data Warehouse? Stefano Rizzi Matteo Golfarelli 134
Cubes Processing
Computing Iceberg Quotient Cubes with Bounding Xiuzhen Zhang Pauline Lienhua Chou Kotagiri Ramamohanarao 145
An Effective Algorithm to Extract Dense Sub-cubes from a Large Sparse Cube Seok-Lyong Lee 155
On the Computation of Maximal-Correlated Cuboids Cells Ronnie Alves Orlando Belo 165
Data Warehouse Applications
Warehousing Dynamic XML Documents Laura Irina Rusu Wenny Rahayu David Taniar 175
Integrating Different Grain Levels in a Medical Data Warehouse Federation Marko Banek A Min Tjoa Nevena Stolba 185
A Versioning Management Model for Ontology-Based Data Warehouses Dung Nguyen Xuan Ladjel Bellatreche Guy Pierra 195
Data Warehouses in Grids with High QoS Rogerio Luis de Carvalho Costa Pedro Furtado 207
Mining Techniques (1)
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy Blaszczynski Krzysztof Dembczynski Wojciech Kotlowski Mariusz Pawlowski 218
Efficient Mining of Dissociation Rules Mikolaj Morzy 288
Optimized Rule Mining Through a Unified Framework for Interestingness Measures Celine Hebert Bruno Cremilleux 238
An Information-Theoretic Framework for Process Structure and Data Mining Antonio D. Chiaravalloti Gianluigi Greco Antonella Guzzo Luigi Pontieri 248
Mining Techniques (2)
Mixed Decision Trees: An Evolutionary Approach Marek Kretowski Marek Grzes 260
ITER: An Algorithm for Predictive Regression Rule Extraction Johan Huysmans Bart Baesens Jan Vanthienen 270
Cobra: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach Kuo-Yu Huang Chia-Hui Chang Jiun-Hung Tung Cheng-Tao Ho 280
Frequent Itemsets
A Greedy Approach to Concurrent Processing of Frequent Itemset Queries Pawel Boinski Marek Wojciechowski Maciej Zakrzewicz 292
Two New Techniques for Hiding Sensitive Itemsets and Their Empirical Evaluation Ahmed HajYasien Vladimir Estivill-Castro 302
EStream: Online Mining of Frequent Sets with Precise Error Guarantee Xuan Hong Dang Wee-Keong Ng Kok-Leong Ong 312
Mining Data Streams
Granularity Adaptive Density Estimation and on Demand Clustering of Concept-Drifting Data Streams Weiheng Zhu Jian Pei Jian Yin Yihuang Xie 322
Classification of Hidden Network Streams Matthew Gebski Alex Penev Raymond K. Wong 332
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang Wee-Keong Ng Kok-Leong Ong 342
An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams Jia-Ling Koh Shu-Ning Shin 352
Ontology-Based Mining
Learning Classifiers from Distributed, Ontology-Extended Data Sources Doina Caragea Jun Zhang Jyotishman Pathak Vasant Honavar 363
A Coherent Biomedical Literature Clustering and Summarization Approach Through Ontology-Enriched Graphical Representations Illhoi Yoo Xiaohua Hu Il-Yeol Song 374
Automatic Extraction for Creating a Lexical Repository of Abbreviations in the Biomedical Literature Min Song Il-Yeol Song Ki Jung Lee 384
Clustering
Priority-Based k-Anonymity Accomplished by Weighted Generalisation Structures Konrad Stark Johann Eder Kurt Zatloukal 394
Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures Jiuyong Li Raymond Chi-Wing Wong Ada Wai-Chee Fu Jian Pei 405
Calculation of Density-Based Clustering Parameters Supported with Distributed Processing Marcin Gorawski Rafal Malczok 417
Cluster-Based Sampling Approaches to Imbalanced Data Distributions Show-Jane Yen Yue-Shi Lee 427
Advanced Mining Techniques
Efficient Mining of Large Maximal Bicliques Guimei Liu Kelvin S.H Sim Jinyan Li 437
Automatic Image Annotation by Mining the Web Zhiguo Gong Qian Liu Jingbai Zhang 449
Privacy Preserving Spatio-temporal Clustering on Horizontally Partitioned Data Ali Inan Yucel Saygin 459
Association Rules
Discovering Semantic Sibling Associations from Web Documents with XTREEM-SP Marko Brunzel Myra Spiliopoulou 469
Difference Detection Between Two Contrast Sets Hui-jing Huang Yongsong Qin Xiaofeng Zhu Jilian Zhang Shichao Zhang 481
EGEA: A New Hybrid Approach Towards Extracting Reduced Generic Association Rule Set (Application to AML Blood Cancer Therapy) M.A. Esseghir G. Gasmi Sadok Ben Yahia Y. Slimani 491
Miscellaneous Applications
AISS: An Index for Non-timestamped Set Subsequence Queries Witold Andrzejewski Tadeusz Morzy 503
A Method for Feature Selection on Microarray Data Using Support Vector Machine Xiao Bing Huang Jian Tang 513
Providing Persistence for Sensor Data Streams by Remote WAL Hideyuki Kawashima Michita Imai Yuichiro Anzai 524
Classification
Support Vector Machine Approach for Fast Classification Keivan Kianmehr Reda Alhajj 534
Document Representations for Classification of Short Web-Page Descriptions Milos Radovanovic Mirjana Ivanovic 544
GARC: A New Associative Classification Approach Ines Bouzouita Samir Elloumi Sadok Ben Yahia 554
Conceptual Modeling for Classification Mining in Data Warehouses Jose Zubcoff Juan Trujillo 566
Author Index 577