Javascript is not enabled in your browser. Enabling JavaScript in your browser will allow you to experience all the features of our site.

Learn how to enable JavaScript on your browser

Knowledge Discovery and Measures of Interest / Edition 1

by Robert Hilderman, Howard J. Hamilton

All Formats & Editions

Overview
Product Details
Related Subjects
Table of Contents

Overview

Hilderman and Hamilton (both of the U. of Regina, Canada) look at two closely related steps in knowledge discovery systems (also known as data mining systems): the generation of discovered knowledge, and the interpretation and evaluation of the discovered knowledge. They present a method whereby a single dataset can be generalized in many ways and to many levels of granularity using a domain generalization graph in order to develop generated knowledge. In the interpretation step, diversity measures as heuristic measure of interestingness for ranking the previously generated summaries are utilized. The diversity measure discussed operate on frequency and probability distributions that can rank the interestingness of generated data. Annotation c. Book News, Inc., Portland, OR (booknews.com)

Product Details

ISBN-13:: 9781441949134
Publisher:: Springer US
Publication date:: 12/08/2010
Series:: Springer International Series in Engineering and Computer Science, #638
Edition description:: Softcover reprint of hardcover 1st ed. 2001
Pages:: 162
Product dimensions:: 0.39(w) x 9.21(h) x 6.14(d)

Related Subjects

	List of Figures	ix
	List of Tables	xi
	Preface	xv
	Acknowledgments	xix
1.	Introduction	1
1.1	KDD in a Nutshell	1
1.1.1	The Mining Step	2
1.1.2	The Interpretation and Evaluation Step	7
1.2	Objective of the Book	9
2.	Background and Related Work	11
2.1	Data Mining Techniques	11
2.1.1	Classification	11
2.1.2	Association	12
2.1.3	Clustering	13
2.1.4	Correlation	14
2.1.5	Other Techniques	15
2.2	Interestingness Measures	15
2.2.1	Rule Interest Function	15
2.2.2	J-Measure	16
2.2.3	Itemset Measures	16
2.2.4	Rule Templates	17
2.2.5	Projected Savings	17
2.2.6	I-Measures	18
2.2.7	Silbershatz and Tuzhilin's Interestingness	18
2.2.8	Kamber and Shinghal's Interestingness	19
2.2.9	Credibility	20
2.2.10	General Impressions	20
2.2.11	Distance Metric	21
2.2.12	Surprisingness	21
2.2.13	Gray and Orlowska's Interestingness	22
2.2.14	Dong and Li's Interestingness	22
2.2.15	Reliable Exceptions	23
2.2.16	Peculiarity	23
3.	A Data Mining Technique	25
3.1	Definitions	25
3.2	The Serial Algorithm	26
3.2.1	General Overview	26
3.2.2	Detailed Walkthrough	28
3.3	The Parallel Algorithm	30
3.3.1	General Overview	31
3.3.2	Detailed Walkthrough	32
3.4	Complexity Analysis	33
3.4.1	Attribute-Oriented Generalization	33
3.4.2	The All_Gen Algorithm	33
3.5	A Comparison with Commercial OLAP Systems	34
4.	Heuristic Measures of Interestingness	37
4.1	Diversity	37
4.2	Notation	39
4.3	The Sixteen Diversity Measures	39
4.3.1	The I[subscript Variance] Measure	39
4.3.2	The I[subscript Simpson] Measure	40
4.3.3	The I[subscript Shannon] Measure	40
4.3.4	The I[subscript Total] Measure	41
4.3.5	The I[subscript Max] Measure	41
4.3.6	The I[subscript McIntosh] Measure	42
4.3.7	The I[subscript Lorenz] Measure	42
4.3.8	The I[subscript Gini] Measure	43
4.3.9	The I[subscript Berger] Measure	44
4.3.10	The I[subscript Schutz] Measure	44
4.3.11	The I[subscript Bray] Measure	44
4.3.12	The I[subscript Whittaker] Measure	44
4.3.13	The I[subscript Kullback] Measure	45
4.3.14	The I[subscript MacArthur] Measure	45
4.3.15	The I[subscript Theil] Measure	46
4.3.16	The I[subscript Atkinson] Measure	46
5.	An Interestingness Framework	47
5.1	Interestingness Principles	47
5.2	Summary	49
5.3	Theorems and Proofs	51
5.3.1	Minimum Value Principle	51
5.3.2	Maximum Value Principle	63
5.3.3	Skewness Principle	79
5.3.4	Permutation Invariance Principle	84
5.3.5	Transfer Principle	84
6.	Experimental Analyses	99
6.1	Evaluation of the All_Gen Algorithm	99
6.1.1	Serial vs Parallel Performance	100
6.1.2	Speedup and Efficiency Improvements	103
6.2	Evaluation of the Sixteen Diversity Measures	104
6.2.1	Comparison of Assigned Ranks	105
6.2.2	Analysis of Ranking Similarities	107
6.2.3	Analysis of Summary Complexity	112
6.2.4	Distribution of Index Values	117
7.	Conclusion	123
7.1	Summary	123
7.2	Areas for Future Research	125
	Appendices	141
	Comparison of Assigned Ranks	141
	Ranking Similarities	149
	Summary Complexity	155
	Index	161

Customer Reviews

Average Review:

Write a Review

and post it to your social network

Most Helpful Customer Reviews

See all customer reviews >