Lecture Notes in Data Mining

Lecture Notes in Data Mining

by Browne Murray
     
 

The continual explosion of information technology and the need for better data collection and management methods has made data mining an even more relevant topic of study. Books on data mining tend to be either broad and introductory or focus on some very specific technical aspect of the field. This book is a series of seventeen edited "student-authored lectures"… See more details below

Overview

The continual explosion of information technology and the need for better data collection and management methods has made data mining an even more relevant topic of study. Books on data mining tend to be either broad and introductory or focus on some very specific technical aspect of the field. This book is a series of seventeen edited "student-authored lectures" which explore in depth the core of data mining (classification, clustering and association rules) by offering overviews that include both analysis and insight.

The initial chapters lay a framework of data mining techniques by explaining some of the basics such as applications of Bayes Theorem, similarity measures, and decision trees. Before focusing on the pillars of classification, clustering and association rules, the book also considers alternative candidates such as point estimation and genetic algorithms.

The book's discussion of classification includes an introduction to decision tree algorithms, rule-based algorithms (a popular alternative to decision trees) and distance-based algorithms. Five of the lecture-chapters are devoted to the concept of clustering or unsupervised classification. The functionality of hierarchical and partitional clustering algorithms is also covered as well as the efficient and scalable clustering algorithms used in large databases. The concept of association rules in terms of basic algorithms, parallel and distributive algorithms and advanced measures that help determine the value of association rules are discussed. The final chapter discusses algorithms for spatial data mining.

Read More

Product Details

ISBN-13:
9789812568021
Publisher:
World Scientific Publishing Company, Incorporated
Publication date:
09/28/2006
Pages:
236
Product dimensions:
6.10(w) x 9.10(h) x 0.60(d)

Related Subjects

Table of Contents


Preface     V
Point Estimation Algorithms     1
Introduction     1
Motivation     2
Methods of Point Estimation     2
The Method of Moments     2
Maximum Likelihood Estimation     4
The Expectation-Maximization Algorithm     6
Measures of Performance     8
Bias     9
Mean Squared Error     9
Standard Error     10
Efficiency     10
Consistency     11
The Jackknife Method     11
Summary     13
Applications of Bayes Theorem     15
Introduction     15
Motivation     16
The Bayes Approach for Classification     17
Statistical Framework for Classification     17
Bayesian Methodology     20
Examples     22
Example 1: Numerical Methods     22
Example 2: Bayesian Networks     24
Summary     25
Similarity Measures     27
Introduction     27
Motivation     28
Classic Similarity Measures     28
Dice     30
Overlap     30
Jaccard     31
Asymmetric     31
Cosine     31
Other Measures     32
Dissimilarity     32
Example     33
Current Applications     35
Multi-Dimensional Modeling     35
Hierarchical Clustering     36
Bioinformatics     37
Summary     38
Decision Trees     39
Introduction     39
Motivation     41
Decision Tree Algorithms     42
ID3 Algorithm     43
Evaluating Tests     43
Selection of Splitting Variable     46
Stopping Criteria     46
Tree Pruning     47
Stability of Decision Trees     47
Example: Classification of University Students     48
Applications of Decision Tree Algorithms     49
Summary     50
Genetic Algorithms     53
Introduction     53
Motivation     54
Fundamentals     55
Encoding Schema and Initialization     56
Fitness Evaluation     57
Selection     58
Crossover     59
Mutation      61
Iterative Evolution     62
Example: The Traveling-Salesman     63
Current and Future Applications     65
Summary     66
Classification: Distance-based Algorithms     67
Introduction     67
Motivation     68
Distance Functions     68
City Block Distance     69
Euclidean Distance     70
Tangent Distance     70
Other Distances     71
Classification Algorithms     72
A Simple Approach Using Mean Vector     72
K-Nearest Neighbors     74
Current Applications     76
Summary     77
Decision Tree-based Algorithms     79
Introduction     79
Motivation     80
ID3     80
C4.5     82
C5.0     83
CART     84
Summary     85
Covering (Rule-based) Algorithms     87
Introduction     87
Motivation     88
Classification Rules     88
Covering (Rule-based) Algorithms     90
1R Algorithm     91
PRISM Algorithm     94
Other Algorithms     96
Applications of Covering Algorithms     97
Summary     97
Clustering: An Overview     99
Introduction     99
Motivation     100
The Clustering Process     100
Pattern Representation     101
Pattern Proximity Measures     102
Clustering Algorithms     103
Hierarchical Algorithms     103
Partitional Algorithms     105
Data Abstraction     105
Cluster Assessment     105
Current Applications     107
Summary     107
Clustering: Hierarchical Algorithms     109
Introduction     109
Motivation     110
Agglomerative Hierarchical Algorithms     111
The Single Linkage Method     112
The Complete Linkage Method     114
The Average Linkage Method     116
The Centroid Method     116
The Ward Method     117
Divisive Hierarchical Algorithms     118
Summary     120
Clustering: Partitional Algorithms     121
Introduction     121
Motivation     122
Partitional Clustering Algorithms     122
Squared Error Clustering     122
Nearest Neighbor Clustering     126
Partitioning Around Medoids     127
Self-Organizing Maps     131
Current Applications     132
Summary     132
Clustering: Large Databases     133
Introduction     133
Motivation     134
Requirements for Scalable Clustering     134
Major Approaches to Scalable Clustering     135
The Divide-and-Conquer Approach     135
Incremental Clustering Approach     135
Parallel Approach to Clustering     136
BIRCH     137
DBSCAN     139
CURE     140
Summary     141
Clustering: Categorical Attributes     143
Introduction     143
Motivation     144
ROCK Clustering Algorithm     145
Computation of Links     146
Goodness Measure     147
Miscellaneous Issues     148
Example     148
COOLCAT Clustering Algorithm     149
CACTUS Clustering Algorithm     151
Summary     152
Association Rules: An Overview      153
Introduction     153
Motivation     154
Association Rule Process     154
Terminology and Notation     154
From Data to Association Rules     157
Large Itemset Discovery Algorithms     158
Apriori     158
Sampling     160
Partitioning     162
Summary     163
Association Rules: Parallel and Distributed Algorithms     169
Introduction     169
Motivation     170
Parallel and Distributed Algorithms     171
Data Parallel Algorithms on Distributed Memory Systems     172
Count Distribution (CD)     172
Task Parallel Algorithms on Distributed Memory Systems     174
Data Distribution (DD)     174
Candidate Distribution (CaD)     174
Intelligent Data Distribution (IDD)     175
Data Parallel Algorithms on Shared Memory Systems     176
Common Candidate Partitioned Database (CCPD)     176
Task Parallel Algorithms on Shared Memory Systems     177
Asynchronous Parallel Mining (APM)     177
Discussion of Parallel Algorithms     177
Summary     179
Association Rules: Advanced Techniques and Measures     183
Introduction     183
Motivation     184
Incremental Rules     184
Generalized Association Rules     185
Quantitative Association Rules     187
Correlation Rules     188
Measuring the Quality of Association Rules     189
Lift     189
Conviction     189
Chi-Squared Test     190
Summary     191
Spatial Mining: Techniques and Algorithms     193
Introduction and Motivation     193
Concept Hierarchies and Generalization     194
Spatial Rules     196
STING     197
Spatial Classification     199
ID3 Extension     200
Two-Step Method     201
Spatial Clustering     202
CLARANS     202
GDBSCAN     203
DBCLASD     204
Summary     204
References     207
Index     219

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >