- Shopping Bag ( 0 items )
Other sellers (Hardcover)
-
All (9) from $60.00
-
New (5) from $90.13
-
Used (4) from $60.0
More About This Textbook
Overview
Requiring no experience with SAS programming, Statistical Data Mining Using SAS Applications, Second Edition describes statistical data mining concepts and demonstrates the features of user-friendly data mining SAS tools. Integrating the statistical and graphical analysis tools available in SAS systems, the book provides complete statistical data mining solutions without writing SAS program codes or using the point-and-click approach. Each chapter emphasizes step-by-step instructions for using SAS macros and interpreting the results. Compiled data mining SAS macro files are available for download on the author's Web site. By following the step-by-step instructions and downloading the SAS macros, analysts can perform complete data mining analysis fast and effectively.
Along with many new features in the SAS-specific macro applications, this second edition now provides access to SAS macros directly from your desktop. Compatible with SAS version 9, SAS Enterprise Guide, and SAS Learning Edition, it also offers the ability to create publication quality graphics and includes a macro-call error check. In addition, all help files have been moved to an appendix.
Features
Includes 13 user-friendly SAS macro applications for performing complete data mining tasks
Shows how to quickly carry out an entire data analysis
Covers advanced data exploration, such as SAS ODS graphics
Presents tools for fast, easy-to-use, and innovative model selection
Contains nearly 150 high-quality analytical figures that are helpful in detecting trends and patterns in a large database
Editorial Reviews
From the Publisher
Its key features include the provision of case studies throughout the sections, downloadable macros and instructions on how to run them. … The step-by-step instructions and the graphical representations of data make it particularly useful to those wishing to communicate complex and technical data to a largely non-specialist audiences.—Kassim S. Mwitondi, Journal of Applied Statistics, 2012
If I had to recommend a good introduction to data mining, I would choose this one.
— J. A. Pardo, Complutense University of Madrid, Madrid, Spain, in Statistical Papers, 2012
Like the first edition of the book, this new edition provides a high-level introduction to some important concepts and algorithms in data mining. … the author presents broad statistical data mining solutions without writing SAS program codes. One of the nicest features of this book is that it gives access to SAS macros directly from the desktop and offers to create publication quality graphs. … this new edition provides a simple and straightforward introduction to data mining, along with a number of detailed, worked case studies.
—Technometrics, February 2011
Praise for the First Edition:
The macros integrate nicely with SAS’s output delivery system … . this is a book that could serve as an easy-to read introduction to some classical statistical techniques that are used in data mining, and, with the associated macros, provide an opportunity to see those techniques in action.
—Journal of the American Statistical Association, June 2004, Vol. 99, No. 466
Use of these data mining SAS macros facilitated reliable conversion, examination, and analysis of the data, and selection of best statistical models despite the great size of the data sets. …
—Christopher Ross, US Bureau of Land Management
An excellent treatment of data mining using SAS applications is provided in this book. … This book would be suitable for students (as a textbook), data analysts, and experienced SAS programmers. No SAS programming experience, however, is required to benefit from the book.
—Computing Reviews, June 2003
… the book provides a welcome contrast to treatments of data mining that focus on only the most novel aspects of the subject. Dr. Fernandez is quite right in pointing out that a lot of data mining can be carried out by standard statistical methods in familiar packages. The book also has a healthy emphasis on the use of cross validation (a hallmark of data mining). This and other concepts are well illustrated with numerous examples. Finally, the book demonstrates that the fancy (and expensive) user interfaces sported by many data mining work benches are not essential to the data mining enterprise and might even be counterproductive.
—Computational Statistics, 2005
Product Details
Meet the Author
George Fernandez is a professor of applied statistical methods and the director of the Center for Research Design and Analysis at the University of Nevada in Reno.
Table of Contents
Preface xiii
Acknowledgments xxi
About the Author xxiii
1 Data Mining: A Gentle Introduction 1
1.1 Introduction 1
1.2 Data Mining: Why It Is Successful in the IT World 2
1.2.1 Availability of Large Databases: Data Warehousing 2
1.2.2 Price Drop in Data Storage and Efficient Computer Processing 3
1.2.3 New Advancements in Analytical Methodology 3
1.3 Benefits of Data Mining 4
1.4 Data Mining: Users 4
1.5 Data Mining: Tools 6
1.6 Data Mining: Steps 6
1.6.1 Identification of Problem and Defining the Data Mining Study Goal 6
1.6.2 Data Processing 6
1.6.3 Data Exploration and Descriptive Analysis 7
1.6.4 Data Mining Solutions: Unsupervised Learning Methods 8
1.6.5 Data Mining Solutions: Supervised Learning Methods 8
1.6.6 Model Validation 9
1.6.7 Interpret and Make Decisions 10
1.7 Problems in the Data Mining Process 10
1.8 SAS Software the Leader in Data Mining 10
1.8.1 SEMMA: The SAS Data Mining Process 11
1.8.2 SAS Enterprise Miner for Comprehensive Data Mining Solution 11
1.9 Introduction of User-Friendly SAS Macros for Statistical Data Mining 12
1.9.1 Limitations of These SAS Macros 13
1.10 Summary 13
References 13
2 Preparing Data for Data Mining 15
2.1 Introduction 15
2.2 Data Requirements in Data Mining 15
2.3 Ideal Structures of Data for Data Mining 16
2.4 Understanding the Measurement Scale of Variables 16
2.5 Entire Database or Representative Sample 17
2.6 Sampling for Data Mining 17
2.6.1 Sample Size 18
2.7 User-Friendly SAS Applications Used in Data Preparation 18
2.7.1 Preparing PC Data Files before Importing into SAS Data 18
2.7.2 Converting PC Data Files to SAS Datasets Using the SAS Import Wizard 20
2.7.3 EXLSAS2 SAS Macro Application to Convert PC Data Formats to SAS Datasets 21
2.7.4 Steps Involved in Running the EXLSAS2 Macro 22
2.7.5 Case Study 1: Importing an Excel File Called "Fraud" to a Permanent SAS Dataset Called "Fraud" 24
2.7.6 SAS Macro Applications-RANSPLIT2: Random Sampling from the Entire Database 25
2.7.7 Steps Involved in Running the RANSPLIT2 Macro 26
2.7.8 Case Study 2: Drawing Training (400), Validation (300), and Test (All Left-Over Observations) Samples from the SAS Data Called "Fraud" 30
2.8 Summary 33
References 33
3 Exploratory Data Analysis 35
3.1 Introduction 35
3.2 Exploring Continuous Variables 35
3.2.1 Descriptive Statistics 35
3.2.1.1 Measures of Location or Central Tendency 36
3.2.1.2 Robust Measures of Location 36
3.2.1.3 Five-Number Summary Statistics 37
3.2.1.4 Measures of Dispersion 37
3.2.1.5 Standard Errors and Confidence Interval Estimates 38
3.2.1.6 Detecting Deviation from Normally Distributed Data 38
3.2.2 Graphical Techniques Used in EDA of Continuous Data 39
3.3 Data Exploration: Categorical Variable 42
3.3.1 Descriptive Statistical Estimates of Categorical Variables 42
3.3.2 Graphical Displays for Categorical Data 43
3.4 SAS Macro Applications Used in Data Exploration 44
3.4.1 Exploring Categorical Variables Using the SAS Macro FREQ2 44
3.4.1.1 Steps Involved in Running the FREQ2 Macro 46
3.4.2 Case Study 1: Exploring Categorical Variables in a SAS Dataset 47
3.4.3 EDA Analysis of Continuous Variables Using SAS Macro UNIVAR2 49
3.4.3.1 Steps Involved in Running the UNIVAR2 Macro 51
3.4.4 Case Study 2: Data Exploration of a Continuous Variable Using UNIVAR2 53
3.4.5 Case Study 3: Exploring Continuous Data by a Group Variable Using UNIVAR2 58
3.4.5.1 Data Descriptions 58
3.5 Summary 64
References 64
4 Unsupervised Learning Methods 67
4.1 Introduction 67
4.2 Applications of Unsupervised Learning Methods 68
4.3 Principal Component Analysis 69
4.3.1 PCA Terminology 70
4.4 Exploratory Factor Analysis 71
4.4.1 Exploratory Factor Analysis versus Principal Component Analysis 72
4.4.2 Exploratory Factor Analysis Terminology 73
4.4.2.1 Communalities and Uniqueness 73
4.4.2.2 Heywood Case 73
4.4.2.3 Cronbach Coefficient Alpha 74
4.4.2.4 Factor Analysis Methods 74
4.4.2.5 Sampling Adequacy Check in Factor Analysis 75
4.4.2.6 Estimating the Number of Factors 75
4.4.2.7 Eigenvalues 76
4.4.2.8 Factor Loadings 76
4.4.2.9 Factor Rotation 77
4.4.2.10 Confidence Intervals and the Significance of Factor Loading Converge 78
4.4.2.11 Standardized Factor Scores 78
4.5 Disjoint Cluster Analysis 80
4.5.1 Types of Cluster Analysis 80
4.5.2 FASTCLUS: SAS Procedure to Perform Disjoint Cluster Analysis 81
4.6 Biplot Display of PCA, EFA, and DCA Results 82
4.7 PCA and EFA Using SAS Macro FACTOR2 82
4.7.1 Steps Involved in Running the FACTOR2 Macro 83
4.7.2 Case Study 1: Principal Component Analysis of 1993 Car Attribute Data 84
4.7.2.1 Study Objectives 84
4.7.2.2 Data Descriptions 85
4.7.3 Case Study 2: Maximum Likelihood FACTOR Analysis with VARIMAX Rotation of 1993 Car Attribute Data 97
4.7.3.1 Study Objectives 97
4.7.3.2 Data Descriptions 97
4.7.3 CASE Study 3: Maximum Likelihood FACTOR Analysis with VARIMAX Rotation Using a Multivariate Data in the Form of Correlation Matrix 116
4.7.3.1 Study Objectives 116
4.7.3.2 Data Descriptions 117
4.8 Disjoint Cluster Analysis Using SAS Macro DISJCLS2 121
4.8.1 Steps Involved in Running the DISJCLS2 Macro 124
4.8.2 Case Study 4: Disjoint Cluster Analysis of 1993 Car Attribute Data 125
4.8.2.1 Study Objectives 125
4.8.2.2 Data Descriptions 126
4.9 Summary 140
References 140
5 Supervised Learning Methods: Prediction 143
5.1 Introduction 143
5.2 Applications of Supervised Predictive Methods 144
5.3 Multiple Linear Regression Modeling 145
5.3.1 Multiple Linear Regressions: Key Concepts and Terminology 145
5.3.2 Model Selection in Multiple Linear Regression 148
5.3.2.1 Best Candidate Models Selected Based on AICC and SBC 149
5.3.2.2 Model Selection Based on the New SAS PROC GLMSELECT 149
5.3.3 Exploratory Analysis Using Diagnostic Plots 150
5.3.4 Violations of Regression Model Assumptions 154
5.3.4.1 Model Specification Error 154
5.3.4.2 Serial Correlation among the Residual 154
5.3.4.3 Influential Outliers 155
5.3.4.4 Multicollinearity 155
5.3.4.5 Heteroscedasticity in Residual Variance 155
5.3.4.6 Nonnormality of Residuals 156
5.3.5 Regression Model Validation 156
5.3.6 Robust Regression 156
5.3.7 Survey Regression 157
5.4 Binary Logistic Regression Modeling 158
5.4.1 Terminology and Key Concepts 158
5.4.2 Model Selection in Logistic Regression 161
5.4.3 Exploratory Analysis Using Diagnostic Plots 162
5.4.3.1 Interpretation 163
5.4.3.2 Two-Factor Interaction Plots between Continuous Variables 164
5.4.4 Checking for Violations of Regression Model Assumptions 164
5.4.4.1 Model Specification Error 164
5.4.4.2 Influential Outlier 164
5.4.4.3 Multicollinearity 165
5.4.4.4 Overdispersion 165
5.5 Ordinal Logistic Regression 165
5.6 Survey Logistic Regression 166
5.7 Multiple Linear Regression Using SAS Macro REGDIAG2 167
5.7.1 Steps Involved in Running the REGDIAG2 Macro 168
5.8 Lift Chart Using SAS Macro LIFT2 169
5.8.1 Steps Involved in Running the LIFT2 Macro 170
5.9 Scoring New Regression Data Using the SAS Macro RSCORE2 170
5.9.1 Steps Involved in Running the RSCORE2 Macro 171
5.10 Logistic Regression Using SAS Macro LOGIST2 172
5.11 Scoring New Logistic Regression Data Using the SAS Macro LSCORE2 173
5.12 Case Study 1: Modeling Multiple Linear Regressions 173
5.12.1 Study Objectives 173
5.12.1.1 Step 1: Preliminary Model Selection 175
5.12.1.2 Step 2: Graphical Exploratory Analysis and Regression Diagnostic Plots 179
5.12.1.3 Step 3: Fitting the Regression Model and Checking for the Violations of Regression Assumptions 191
5.12.1.4 Remedial Measure: Robust Regression to Adjust the Regression Parameter Estimates to Extreme Outliers 203
5.13 Case Study 2: If-Then Analysis and Lift Charts 206
5.13.1 Data Descriptions 208
5.14 Case Study 3: Modeling Multiple Linear Regression with Categorical Variables 212
5.14.1 Study Objectives 212
5.14.2 Data Descriptions 212
5.15 Case Study 4: Modeling Binary Logistic Regression 232
5.15.1 Study Objectives 232
5.15.2 Data Descriptions 234
5.15.2.1 Step 1: Best Candidate Model Selection 235
5.15.2.2 Step 2: Exploratory Analysis/Diagnostic Plots 237
5.15.2.3 Step 3: Fitting Binary Logistic Regression 239
5.16 Case Study: 5 Modeling Binary Multiple Logistic Regression 260
5.16.1 Study Objectives 260
5.16.2 Data Descriptions 261
5.17 Case Study: 6 Modeling Ordinal Multiple Logistic Regression 286
5.17.1 Study Objectives 286
5.17.2 Data Descriptions 286
5.18 Summary 301
References 301
6 Supervised Learning Methods: Classification 305
6.1 Introduction 305
6.2 Discriminant Analysis 306
6.3 Stepwise Discriminant Analysis 306
6.4 Canonical Discriminant Analysis 308
6.4.1 Canonical Discriminant Analysis Assumptions 308
6.4.2 Key Concepts and Terminology in Canonical Discriminant Analysis 309
6.5 Discriminant Function Analysis 310
6.5.1 Key Concepts and Terminology in Discriminant Function Analysis 310
6.6 Applications of Discriminant Analysis 313
6.7 Classification Tree Based on CHAID 313
6.7.1 Key Concepts and Terminology in Classification Tree Methods 314
6.8 Applications of CHAID 316
6.9 Discriminant Analysis Using SAS Macro DISCRIM2 316
6.9.1 Steps Involved in Running the DISCRIM2 Macro 317
6.10 Decision Tree Using SAS Macro CHAID2 318
6.10.1 Steps Involved in Running the CHAID2 Macro 319
6.11 Case Study 1: Canonical Discriminant Analysis and Parametric Discriminant Function Analysis 320
6.11.1 Study Objectives 320
6.11.2 Case Study 1: Parametric Discriminant Analysis 321
6.11.2.1 Canonical Discriminant Analysis (CDA) 328
6.12 Case Study 2: Nonparametric Discriminant Function Analysis 346
6.12.1 Study Objectives 346
6.12.2 Data Descriptions 347
6.13 Case Study 3: Classification Tree Using CHAID 363
6.13.1 Study Objectives 364
6.13.2 Data Descriptions 364
6.14 Summary 375
References 376
7 Advanced Analytics and Other SAS Data Mining Resources 377
7.1 Introduction 377
7.2 Artificial Neural Network Methods 378
7.3 Market Basket Analysis 379
7.3.1 Benefits of MBA 380
7.3.2 Limitations of Market Basket Analysis 380
7.4 SAS Software: The Leader in Data Mining 381
7.5 Summary 382
References 382
Appendix I Instruction for Using the SAS Macros 383
Appendix II Data Mining SAS Macro Help Files 387
Appendix III Instruction for Using the SAS Macros with Enterprise Guide Code Window 441
Index 443