Knowledge Discovery and Measures of Interest / Edition 1

Knowledge Discovery and Measures of Interest / Edition 1

by Robert Hilderman, Howard J. Hamilton
     
 

Hilderman and Hamilton (both of the U. of Regina, Canada) look at two closely related steps in knowledge discovery systems (also known as data mining systems): the generation of discovered knowledge, and the interpretation and evaluation of the discovered knowledge. They present a method whereby a single dataset can be generalized in many ways and to many levels of… See more details below

Overview

Hilderman and Hamilton (both of the U. of Regina, Canada) look at two closely related steps in knowledge discovery systems (also known as data mining systems): the generation of discovered knowledge, and the interpretation and evaluation of the discovered knowledge. They present a method whereby a single dataset can be generalized in many ways and to many levels of granularity using a domain generalization graph in order to develop generated knowledge. In the interpretation step, diversity measures as heuristic measure of interestingness for ranking the previously generated summaries are utilized. The diversity measure discussed operate on frequency and probability distributions that can rank the interestingness of generated data. Annotation c. Book News, Inc., Portland, OR (booknews.com)

Product Details

ISBN-13:
9781441949134
Publisher:
Springer US
Publication date:
12/08/2010
Series:
Springer International Series in Engineering and Computer Science, #638
Edition description:
Softcover reprint of hardcover 1st ed. 2001
Pages:
162
Product dimensions:
0.39(w) x 9.21(h) x 6.14(d)

Table of Contents

List of Figuresix
List of Tablesxi
Prefacexv
Acknowledgmentsxix
1.Introduction1
1.1KDD in a Nutshell1
1.1.1The Mining Step2
1.1.2The Interpretation and Evaluation Step7
1.2Objective of the Book9
2.Background and Related Work11
2.1Data Mining Techniques11
2.1.1Classification11
2.1.2Association12
2.1.3Clustering13
2.1.4Correlation14
2.1.5Other Techniques15
2.2Interestingness Measures15
2.2.1Rule Interest Function15
2.2.2J-Measure16
2.2.3Itemset Measures16
2.2.4Rule Templates17
2.2.5Projected Savings17
2.2.6I-Measures18
2.2.7Silbershatz and Tuzhilin's Interestingness18
2.2.8Kamber and Shinghal's Interestingness19
2.2.9Credibility20
2.2.10General Impressions20
2.2.11Distance Metric21
2.2.12Surprisingness21
2.2.13Gray and Orlowska's Interestingness22
2.2.14Dong and Li's Interestingness22
2.2.15Reliable Exceptions23
2.2.16Peculiarity23
3.A Data Mining Technique25
3.1Definitions25
3.2The Serial Algorithm26
3.2.1General Overview26
3.2.2Detailed Walkthrough28
3.3The Parallel Algorithm30
3.3.1General Overview31
3.3.2Detailed Walkthrough32
3.4Complexity Analysis33
3.4.1Attribute-Oriented Generalization33
3.4.2The All_Gen Algorithm33
3.5A Comparison with Commercial OLAP Systems34
4.Heuristic Measures of Interestingness37
4.1Diversity37
4.2Notation39
4.3The Sixteen Diversity Measures39
4.3.1The I[subscript Variance] Measure39
4.3.2The I[subscript Simpson] Measure40
4.3.3The I[subscript Shannon] Measure40
4.3.4The I[subscript Total] Measure41
4.3.5The I[subscript Max] Measure41
4.3.6The I[subscript McIntosh] Measure42
4.3.7The I[subscript Lorenz] Measure42
4.3.8The I[subscript Gini] Measure43
4.3.9The I[subscript Berger] Measure44
4.3.10The I[subscript Schutz] Measure44
4.3.11The I[subscript Bray] Measure44
4.3.12The I[subscript Whittaker] Measure44
4.3.13The I[subscript Kullback] Measure45
4.3.14The I[subscript MacArthur] Measure45
4.3.15The I[subscript Theil] Measure46
4.3.16The I[subscript Atkinson] Measure46
5.An Interestingness Framework47
5.1Interestingness Principles47
5.2Summary49
5.3Theorems and Proofs51
5.3.1Minimum Value Principle51
5.3.2Maximum Value Principle63
5.3.3Skewness Principle79
5.3.4Permutation Invariance Principle84
5.3.5Transfer Principle84
6.Experimental Analyses99
6.1Evaluation of the All_Gen Algorithm99
6.1.1Serial vs Parallel Performance100
6.1.2Speedup and Efficiency Improvements103
6.2Evaluation of the Sixteen Diversity Measures104
6.2.1Comparison of Assigned Ranks105
6.2.2Analysis of Ranking Similarities107
6.2.3Analysis of Summary Complexity112
6.2.4Distribution of Index Values117
7.Conclusion123
7.1Summary123
7.2Areas for Future Research125
Appendices141
Comparison of Assigned Ranks141
Ranking Similarities149
Summary Complexity155
Index161

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >