Data Mining with Decision Trees: Theory and Applications
by Lior Rokach, Oded MaimonThis is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique.
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover
… See more details belowOverview
This is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique.
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover useful patterns. The area is of great importance because it enables modeling and knowledge extraction from the abundance of data available. Both theoreticians and practitioners are continually seeking techniques to make the process more efficient, cost-effective and accurate. Decision trees, originally implemented in decision theory and statistics, are highly effective tools in other areas such as data mining, text mining, information extraction, machine learning, and pattern recognition. This book invites readers to explore the many benefits in data mining that decision trees offer:
• Self-explanatory and easy to follow when compacted
• Able to handle a variety of input data: nominal, numeric and textual
• Able to process datasets that may have errors or missing values
• High predictive performance for a relatively small computational effort
• Available in many data mining packages over a variety of platforms
• Useful for various tasks, such as classification, regression, clustering and feature selection
Product Details
- ISBN-13:
- 9789812771711
- Publisher:
- World Scientific Publishing Company, Incorporated
- Publication date:
- 04/28/2008
- Series:
- Series in Machine Perception and Artificial Intelligence
- Pages:
- 264
- Product dimensions:
- 6.02(w) x 9.29(h) x 0.75(d)
Table of Contents
Preface viiIntroduction to Decision Trees 1
Data Mining and Knowledge Discovery 1
Taxonomy of Data Mining Methods 3
Supervised Methods 4
Overview 4
Classification Trees 5
Characteristics of Classification Trees 8
Tree Size 9
The Hierarchical Nature of Decision Trees 10
Relation to Rule Induction 11
Growing Decision Trees 13
Training Set 13
Definition of the Classification Problem 14
Induction Algorithms 16
Probability Estimation in Decision Trees 16
Laplace Correction 17
No Match 18
Algorithmic Framework for Decision Trees 18
Stopping Criteria 19
Evaluation of Classification Trees 21
Overview 21
Generalization Error 21
Theoretical Estimation of Generalization Error 22
Empirical Estimation of Generalization Error 23
Alternatives to the Accuracy Measure 24
The F-Measure 25
Confusion Matrix 27
Classifier Evaluationunder Limited Resources 28
ROC Curves 30
Hit Rate Curve 30
Qrecall (Quota Recall) 32
Lift Curve 32
Pearson Correlation Coefficient 32
Area Under Curve (AUC) 34
Average Hit Rate 35
Average Qrecall 35
Potential Extract Measure (PEM) 36
Which Decision Tree Classifier is Better? 40
McNemar's Test 40
A Test for the Difference of Two Proportions 41
The Resampled Paired t Test 43
The k-fold Cross-validated Paired t Test 43
Computational Complexity 44
Comprehensibility 44
Scalability to Large Datasets 45
Robustness 47
Stability 47
Interestingness Measures 48
Overfitting and Underfitting 49
"No Free Lunch" Theorem 50
Splitting Criteria 53
Univariate Splitting Criteria 53
Overview 53
Impurity based Criteria 53
Information Gain 54
Gini Index 55
Likelihood Ratio Chi-squared Statistics 55
DKM Criterion 55
Normalized Impurity-based Criteria 56
Gain Ratio 56
Distance Measure 56
Binary Criteria 57
Twoing Criterion 57
Orthogonal Criterion 58
Kolmogorov-Smirnov Criterion 58
AUC Splitting Criteria 58
Other Univariate Splitting Criteria 59
Comparison of Univariate Splitting Criteria 59
Handling Missing Values 59
Pruning Trees 63
Stopping Criteria 63
Heuristic Pruning 63
Overview 63
Cost Complexity Pruning 64
Reduced Error Pruning 65
Minimum Error Pruning (MEP) 65
Pessimistic Pruning 65
Error-Based Pruning (EBP) 66
Minimum Description Length (MDL) Pruning 67
Other Pruning Methods 67
Comparison of Pruning Methods 68
Optimal Pruning 68
Advanced Decision Trees 71
Survey of Common Algorithms for Decision Tree Induction 71
ID3 71
C4.5 71
CART 71
CHAID 72
QUEST 73
Reference to Other Algorithms 73
Advantages and Disadvantages of Decision Trees 73
Oblivious Decision Trees 76
Decision Trees Inducers for Large Datasets 78
Online Adaptive Decision Trees 79
Lazy Tree 79
Option Tree 80
Lookahead 82
Oblique Decision Trees 83
Decision Forests 87
Overview 87
Introduction 87
Combination Methods 90
Weighting Methods 90
Majority Voting 90
Performance Weighting 91
Distribution Summation 91
Bayesian Combination 91
Dempster-Shafer 92
Vogging 92
Naive Bayes 93
Entropy Weighting 93
Density-based Weighting 93
DEA Weighting Method 93
Logarithmic Opinion Pool 94
Gating Network 94
Order Statistics 95
Meta-combination Methods 95
Stacking 95
Arbiter Trees 97
Combiner Trees 99
Grading 100
Classifier Dependency 101
Dependent Methods 101
Model-guided Instance Selection 101
Incremental Batch Learning 105
Independent Methods 105
Bagging 105
Wagging 107
Random Forest 108
Cross-validated Committees 109
Ensemble Diversity 109
Manipulating the Inducer 110
Manipulation of the Inducer's Parameters 111
Starting Point in Hypothesis Space 111
Hypothesis Space Traversal 111
Manipulating the Training Samples 112
Resampling 112
Creation 113
Partitioning 113
Manipulating the Target Attribute Representation 114
Partitioning the Search Space 115
Divide and Conquer 116
Feature Subset-based Ensemble Methods 117
Multi-Inducers 121
Measuring the Diversity 122
Ensemble Size 124
Selecting the Ensemble Size 124
Pre Selection of the Ensemble Size 124
Selection of the Ensemble Size while Training 125
Pruning - Post Selection of the Ensemble Size 125
Pre-combining Pruning 126
Post-combining Pruning 126
Cross-Inducer 127
Multistrategy Ensemble Learning 127
Which Ensemble Method Should be Used? 128
Open Source for Decision Trees Forests 128
Incremental Learning of Decision Trees 131
Overview 131
The Motives for Incremental Learning 131
The Inefficiency Challenge 132
The Concept Drift Challenge 133
Feature Selection 137
Overview 137
The "Curse of Dimensionality" 137
Techniques for Feature Selection 140
Feature Filters 141
FOCUS 141
LVF 141
Using One Learning Algorithm as a Filter for Another 141
An Information Theoretic Feature Filter 142
An Instance Based Approach to Feature Selection - RELIEF 142
Simba and G-flip 142
Contextual Merit Algorithm 143
Using Traditional Statistics for Filtering 143
Mallows Cp 143
AIC, BIC and F-ratio 144
Principal Component Analysis (PCA) 144
Factor Analysis (FA) 145
Projection Pursuit 145
Wrappers 145
Wrappers for Decision Tree Learners 145
Feature Selection as a Means of Creating Ensembles 146
Ensemble Methodology as a Means for Improving Feature Selection 147
Independent Algorithmic Framework 149
Combining Procedure 150
Simple Weighted Voting 151
Naive Bayes Weighting using Artificial Contrasts 152
Feature Ensemble Generator 154
Multiple Feature Selectors 154
Bagging 156
Using Decision Trees for Feature Selection 156
Limitation of Feature Selection Methods 157
Fuzzy Decision Trees 159
Overview 159
Membership Function 160
Fuzzy Classification Problems 161
Fuzzy Set Operations 163
Fuzzy Classification Rules 164
Creating Fuzzy Decision Tree 164
Fuzzifying Numeric Attributes 165
Inducing of Fuzzy Decision Tree 166
Simplifying the Decision Tree 169
Classification of New Instances 169
Other Fuzzy Decision Tree Inducers 169
Hybridization of Decision Trees with other Techniques 171
Introduction 171
A Decision Tree Framework for Instance-Space Decomposition 171
Stopping Rules 174
Splitting Rules 175
Split Validation Examinations 175
The CPOM Algorithm 176
CPOM Outline 176
The Grouped Gain Ratio Splitting Rule 177
Induction of Decision Trees by an Evolutionary Algorithm 179
Sequence Classification Using Decision Trees 187
Introduction 187
Sequence Representation 187
Pattern Discovery 188
Pattern Selection 190
Heuristics for Pattern Selection 190
Correlation based Feature Selection 191
Classifier Training 191
Adjustment of Decision Trees 192
Cascading Decision Trees 192
Application of CREDT in Improving of Information Retrieval of Medical Narrative Reports 193
Related Works 195
Text Classification 195
Part-of-speech Tagging 198
Frameworks for Information Extraction 198
Frameworks for Labeling Sequential Data 199
Identifying Negative Context in Nondomain Specific Text (General NLP) 199
Identifying Negative Context in Medical Narratives 200
Works Based on Knowledge Engineering 200
Works based on Machine Learning 201
Using CREDT for Solving the Negation Problem 201
The Process Overview 201
Step 1: Corpus Preparation 201
Step 1.1: Tagging 202
Step 1.2: Sentence Boundaries 202
Step 1.3: Manual Labeling 203
Step 2: Patterns Creation 203
Step 3: Patterns Selection 206
Step 4: Classifier Training 208
Cascade of Three Classifiers 209
Bibliography 215
Index 243
Customer Reviews
Average Review: