Lecture Notes in Data Mining
by Browne MurrayThe continual explosion of information technology and the need for better data collection and management methods has made data mining an even more relevant topic of study. Books on data mining tend to be either broad and introductory or focus on some very specific technical aspect of the field. This book is a series of seventeen edited "student-authored lectures"… See more details below
Overview
The continual explosion of information technology and the need for better data collection and management methods has made data mining an even more relevant topic of study. Books on data mining tend to be either broad and introductory or focus on some very specific technical aspect of the field. This book is a series of seventeen edited "student-authored lectures" which explore in depth the core of data mining (classification, clustering and association rules) by offering overviews that include both analysis and insight.
The initial chapters lay a framework of data mining techniques by explaining some of the basics such as applications of Bayes Theorem, similarity measures, and decision trees. Before focusing on the pillars of classification, clustering and association rules, the book also considers alternative candidates such as point estimation and genetic algorithms.
The book's discussion of classification includes an introduction to decision tree algorithms, rule-based algorithms (a popular alternative to decision trees) and distance-based algorithms. Five of the lecture-chapters are devoted to the concept of clustering or unsupervised classification. The functionality of hierarchical and partitional clustering algorithms is also covered as well as the efficient and scalable clustering algorithms used in large databases. The concept of association rules in terms of basic algorithms, parallel and distributive algorithms and advanced measures that help determine the value of association rules are discussed. The final chapter discusses algorithms for spatial data mining.
Product Details
- ISBN-13:
- 9789812568021
- Publisher:
- World Scientific Publishing Company, Incorporated
- Publication date:
- 09/28/2006
- Pages:
- 236
- Product dimensions:
- 6.10(w) x 9.10(h) x 0.60(d)
Table of Contents
Preface V
Point Estimation Algorithms 1
Introduction 1
Motivation 2
Methods of Point Estimation 2
The Method of Moments 2
Maximum Likelihood Estimation 4
The Expectation-Maximization Algorithm 6
Measures of Performance 8
Bias 9
Mean Squared Error 9
Standard Error 10
Efficiency 10
Consistency 11
The Jackknife Method 11
Summary 13
Applications of Bayes Theorem 15
Introduction 15
Motivation 16
The Bayes Approach for Classification 17
Statistical Framework for Classification 17
Bayesian Methodology 20
Examples 22
Example 1: Numerical Methods 22
Example 2: Bayesian Networks 24
Summary 25
Similarity Measures 27
Introduction 27
Motivation 28
Classic Similarity Measures 28
Dice 30
Overlap 30
Jaccard 31
Asymmetric 31
Cosine 31
Other Measures 32
Dissimilarity 32
Example 33
Current Applications 35
Multi-Dimensional Modeling 35
Hierarchical Clustering 36
Bioinformatics 37
Summary 38
Decision Trees 39
Introduction 39
Motivation 41
Decision Tree Algorithms 42
ID3 Algorithm 43
Evaluating Tests 43
Selection of Splitting Variable 46
Stopping Criteria 46
Tree Pruning 47
Stability of Decision Trees 47
Example: Classification of University Students 48
Applications of Decision Tree Algorithms 49
Summary 50
Genetic Algorithms 53
Introduction 53
Motivation 54
Fundamentals 55
Encoding Schema and Initialization 56
Fitness Evaluation 57
Selection 58
Crossover 59
Mutation 61
Iterative Evolution 62
Example: The Traveling-Salesman 63
Current and Future Applications 65
Summary 66
Classification: Distance-based Algorithms 67
Introduction 67
Motivation 68
Distance Functions 68
City Block Distance 69
Euclidean Distance 70
Tangent Distance 70
Other Distances 71
Classification Algorithms 72
A Simple Approach Using Mean Vector 72
K-Nearest Neighbors 74
Current Applications 76
Summary 77
Decision Tree-based Algorithms 79
Introduction 79
Motivation 80
ID3 80
C4.5 82
C5.0 83
CART 84
Summary 85
Covering (Rule-based) Algorithms 87
Introduction 87
Motivation 88
Classification Rules 88
Covering (Rule-based) Algorithms 90
1R Algorithm 91
PRISM Algorithm 94
Other Algorithms 96
Applications of Covering Algorithms 97
Summary 97
Clustering: An Overview 99
Introduction 99
Motivation 100
The Clustering Process 100
Pattern Representation 101
Pattern Proximity Measures 102
Clustering Algorithms 103
Hierarchical Algorithms 103
Partitional Algorithms 105
Data Abstraction 105
Cluster Assessment 105
Current Applications 107
Summary 107
Clustering: Hierarchical Algorithms 109
Introduction 109
Motivation 110
Agglomerative Hierarchical Algorithms 111
The Single Linkage Method 112
The Complete Linkage Method 114
The Average Linkage Method 116
The Centroid Method 116
The Ward Method 117
Divisive Hierarchical Algorithms 118
Summary 120
Clustering: Partitional Algorithms 121
Introduction 121
Motivation 122
Partitional Clustering Algorithms 122
Squared Error Clustering 122
Nearest Neighbor Clustering 126
Partitioning Around Medoids 127
Self-Organizing Maps 131
Current Applications 132
Summary 132
Clustering: Large Databases 133
Introduction 133
Motivation 134
Requirements for Scalable Clustering 134
Major Approaches to Scalable Clustering 135
The Divide-and-Conquer Approach 135
Incremental Clustering Approach 135
Parallel Approach to Clustering 136
BIRCH 137
DBSCAN 139
CURE 140
Summary 141
Clustering: Categorical Attributes 143
Introduction 143
Motivation 144
ROCK Clustering Algorithm 145
Computation of Links 146
Goodness Measure 147
Miscellaneous Issues 148
Example 148
COOLCAT Clustering Algorithm 149
CACTUS Clustering Algorithm 151
Summary 152
Association Rules: An Overview 153
Introduction 153
Motivation 154
Association Rule Process 154
Terminology and Notation 154
From Data to Association Rules 157
Large Itemset Discovery Algorithms 158
Apriori 158
Sampling 160
Partitioning 162
Summary 163
Association Rules: Parallel and Distributed Algorithms 169
Introduction 169
Motivation 170
Parallel and Distributed Algorithms 171
Data Parallel Algorithms on Distributed Memory Systems 172
Count Distribution (CD) 172
Task Parallel Algorithms on Distributed Memory Systems 174
Data Distribution (DD) 174
Candidate Distribution (CaD) 174
Intelligent Data Distribution (IDD) 175
Data Parallel Algorithms on Shared Memory Systems 176
Common Candidate Partitioned Database (CCPD) 176
Task Parallel Algorithms on Shared Memory Systems 177
Asynchronous Parallel Mining (APM) 177
Discussion of Parallel Algorithms 177
Summary 179
Association Rules: Advanced Techniques and Measures 183
Introduction 183
Motivation 184
Incremental Rules 184
Generalized Association Rules 185
Quantitative Association Rules 187
Correlation Rules 188
Measuring the Quality of Association Rules 189
Lift 189
Conviction 189
Chi-Squared Test 190
Summary 191
Spatial Mining: Techniques and Algorithms 193
Introduction and Motivation 193
Concept Hierarchies and Generalization 194
Spatial Rules 196
STING 197
Spatial Classification 199
ID3 Extension 200
Two-Step Method 201
Spatial Clustering 202
CLARANS 202
GDBSCAN 203
DBCLASD 204
Summary 204
References 207
Index 219
Customer Reviews
Average Review: