Knowledge Discovery and Measures of Interest / Edition 1
by Robert Hilderman, Howard J. HamiltonHilderman and Hamilton (both of the U. of Regina, Canada) look at two closely related steps in knowledge discovery systems (also known as data mining systems): the generation of discovered knowledge, and the interpretation and evaluation of the discovered knowledge. They present a method whereby a single dataset can be generalized in many ways and to many levels of… See more details below
Overview
Hilderman and Hamilton (both of the U. of Regina, Canada) look at two closely related steps in knowledge discovery systems (also known as data mining systems): the generation of discovered knowledge, and the interpretation and evaluation of the discovered knowledge. They present a method whereby a single dataset can be generalized in many ways and to many levels of granularity using a domain generalization graph in order to develop generated knowledge. In the interpretation step, diversity measures as heuristic measure of interestingness for ranking the previously generated summaries are utilized. The diversity measure discussed operate on frequency and probability distributions that can rank the interestingness of generated data. Annotation c. Book News, Inc., Portland, OR (booknews.com)
Product Details
- ISBN-13:
- 9781441949134
- Publisher:
- Springer US
- Publication date:
- 12/08/2010
- Series:
- Springer International Series in Engineering and Computer Science, #638
- Edition description:
- Softcover reprint of hardcover 1st ed. 2001
- Pages:
- 162
- Product dimensions:
- 0.39(w) x 9.21(h) x 6.14(d)
Table of Contents
List of Figures | ix | |
List of Tables | xi | |
Preface | xv | |
Acknowledgments | xix | |
1. | Introduction | 1 |
1.1 | KDD in a Nutshell | 1 |
1.1.1 | The Mining Step | 2 |
1.1.2 | The Interpretation and Evaluation Step | 7 |
1.2 | Objective of the Book | 9 |
2. | Background and Related Work | 11 |
2.1 | Data Mining Techniques | 11 |
2.1.1 | Classification | 11 |
2.1.2 | Association | 12 |
2.1.3 | Clustering | 13 |
2.1.4 | Correlation | 14 |
2.1.5 | Other Techniques | 15 |
2.2 | Interestingness Measures | 15 |
2.2.1 | Rule Interest Function | 15 |
2.2.2 | J-Measure | 16 |
2.2.3 | Itemset Measures | 16 |
2.2.4 | Rule Templates | 17 |
2.2.5 | Projected Savings | 17 |
2.2.6 | I-Measures | 18 |
2.2.7 | Silbershatz and Tuzhilin's Interestingness | 18 |
2.2.8 | Kamber and Shinghal's Interestingness | 19 |
2.2.9 | Credibility | 20 |
2.2.10 | General Impressions | 20 |
2.2.11 | Distance Metric | 21 |
2.2.12 | Surprisingness | 21 |
2.2.13 | Gray and Orlowska's Interestingness | 22 |
2.2.14 | Dong and Li's Interestingness | 22 |
2.2.15 | Reliable Exceptions | 23 |
2.2.16 | Peculiarity | 23 |
3. | A Data Mining Technique | 25 |
3.1 | Definitions | 25 |
3.2 | The Serial Algorithm | 26 |
3.2.1 | General Overview | 26 |
3.2.2 | Detailed Walkthrough | 28 |
3.3 | The Parallel Algorithm | 30 |
3.3.1 | General Overview | 31 |
3.3.2 | Detailed Walkthrough | 32 |
3.4 | Complexity Analysis | 33 |
3.4.1 | Attribute-Oriented Generalization | 33 |
3.4.2 | The All_Gen Algorithm | 33 |
3.5 | A Comparison with Commercial OLAP Systems | 34 |
4. | Heuristic Measures of Interestingness | 37 |
4.1 | Diversity | 37 |
4.2 | Notation | 39 |
4.3 | The Sixteen Diversity Measures | 39 |
4.3.1 | The I[subscript Variance] Measure | 39 |
4.3.2 | The I[subscript Simpson] Measure | 40 |
4.3.3 | The I[subscript Shannon] Measure | 40 |
4.3.4 | The I[subscript Total] Measure | 41 |
4.3.5 | The I[subscript Max] Measure | 41 |
4.3.6 | The I[subscript McIntosh] Measure | 42 |
4.3.7 | The I[subscript Lorenz] Measure | 42 |
4.3.8 | The I[subscript Gini] Measure | 43 |
4.3.9 | The I[subscript Berger] Measure | 44 |
4.3.10 | The I[subscript Schutz] Measure | 44 |
4.3.11 | The I[subscript Bray] Measure | 44 |
4.3.12 | The I[subscript Whittaker] Measure | 44 |
4.3.13 | The I[subscript Kullback] Measure | 45 |
4.3.14 | The I[subscript MacArthur] Measure | 45 |
4.3.15 | The I[subscript Theil] Measure | 46 |
4.3.16 | The I[subscript Atkinson] Measure | 46 |
5. | An Interestingness Framework | 47 |
5.1 | Interestingness Principles | 47 |
5.2 | Summary | 49 |
5.3 | Theorems and Proofs | 51 |
5.3.1 | Minimum Value Principle | 51 |
5.3.2 | Maximum Value Principle | 63 |
5.3.3 | Skewness Principle | 79 |
5.3.4 | Permutation Invariance Principle | 84 |
5.3.5 | Transfer Principle | 84 |
6. | Experimental Analyses | 99 |
6.1 | Evaluation of the All_Gen Algorithm | 99 |
6.1.1 | Serial vs Parallel Performance | 100 |
6.1.2 | Speedup and Efficiency Improvements | 103 |
6.2 | Evaluation of the Sixteen Diversity Measures | 104 |
6.2.1 | Comparison of Assigned Ranks | 105 |
6.2.2 | Analysis of Ranking Similarities | 107 |
6.2.3 | Analysis of Summary Complexity | 112 |
6.2.4 | Distribution of Index Values | 117 |
7. | Conclusion | 123 |
7.1 | Summary | 123 |
7.2 | Areas for Future Research | 125 |
Appendices | 141 | |
Comparison of Assigned Ranks | 141 | |
Ranking Similarities | 149 | |
Summary Complexity | 155 | |
Index | 161 |
Customer Reviews
Average Review: