- Shopping Bag ( 0 items )
Want a NOOK? Explore Now
Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for email worms, malicious code, remote exploits, and botnets.
The authors describe the systems they have designed and developed: email worm detection using data mining, a scalable multi-level feature extraction technique to detect malicious executables, detecting remote exploits using data mining, and flow-based identification of botnet traffic by mining multiple log files. For each of these tools, they detail the system architecture, algorithms, performance results, and limitations.
From algorithms to experimental results, this is one of the few books that will be equally valuable to those in industry, government, and academia. It will help technologists decide which tools to select for specific applications, managers will learn how to determine whether or not to proceed with a data mining project, and developers will find innovative alternative designs for a range of applications.
Introduction
Trends
Data Mining and Security Technologies
Data Mining for Email Worm Detection
Data Mining for Malicious Code Detection
Data Mining for Detecting Remote Exploits
Data Mining for Botnet Detection
Stream Data Mining
Emerging Data Mining Tools for Cyber Security Applications
Organization of This Book
Next Steps
Part I: DATA MINING AND SECURITY
Introduction to Part I: Data Mining and Security
Data Mining Techniques
Introduction
Overview of Data Mining Tasks and Techniques
Artificial Neural Network
Support Vector Machines
Markov Model
Association Rule Mining (ARM)
Multi-class Problem
2.7.1 One-VS-One
2.7.2 One-VS-All
Image Mining
2.8.1 Feature Selection
2.8.2 Automatic Image Annotation
2.8.3 Image Classification
Summary
References
Malware
Introduction
Viruses
Worms
Trojan Horses
Time and Logic Bombs
Botnet
Spyware
Summary
References
Data Mining for Security Applications
Overview
Data Mining for Cyber Security
4.2.1 Overview
4.2.2 Cyber-terrorism, Insider Threats, and External Attacks
4.2.3 Malicious Intrusions
4.2.4 Credit Card Fraud and Identity Theft
4.2.5 Attacks on Critical Infrastructures
4.2.6 Data Mining for Cyber Security
Current Research and Development
Summary
References
Design and Implementation of Data Mining Tools
Introduction
Intrusion Detection
Web Page Surfing Prediction
Image Classification
Summary and Directions
References
Conclusion to Part I
DATA MINING FOR EMAIL WORM DETECTION
Introduction to Part II
Email Worm Detection
Introduction
Architecture
Related Work
Overview of Our Approach
Summary
References
Design of the Data Mining Tool
Introduction
Architecture
Feature Description
7.3.1 Per-Email Features
7.3.2 Per-Window Features
Feature Reduction Techniques
7.4.1 Dimension Reduction
7.4.2 Two-Phase Feature Selection (TPS)
7.4.2.1 Phase I
7.4.2.2 Phase II
Classification Techniques
Summary
References
Evaluation and Results
Introduction
Dataset
Experimental Setup
Results
8.4.1 Results from Unreduced Data
8.4.2 Results from PCA-Reduced Data
8.4.3 Results from Two-Phase Selection
Summary
References
Conclusion to Part II
Part III: DATA MINING FOR DETECTING MALICIOUS EXECUTABLES
Introduction to Part III
Malicious Executables
Introduction
Architecture
Related Work
Hybrid Feature Retrieval (HFR) Model
Summary and Directions
References
Design of the Data Mining Tool
Introduction
Feature Extraction Using n-Gram Analysis
10.2.1 Binary n-Gram Feature
10.2.2 Feature Collection
10.2.3 Feature Selection
10.2.4 Assembly n-Gram Feature
10.2.5 DLL Function Call Feature
The Hybrid Feature Retrieval Model
10.3.1 Description of the Model
10.3.2 The Assembly Feature Retrieval (AFR) Algorithm
10.3.3 Feature Vector Computation and Classification
Summary and Directions
References
Evaluation and Results
Introduction
Experiments
Dataset
Experimental Setup
Results
11.5.1 Accuracy
11.5.1.1 Dataset1
11.5.1.2 Dataset2
11.5.1.3 Statistical Significance Test
11.5.1.4 DLL Call Feature
11.5.2 ROC Curves
11.5.3 False Positive and False Negative
11.5.4 Running Time
11.5.5 Training and Testing with Boosted J48
Example Run
Summary and Directions
References
Conclusion to Part III
DATA MINING FOR DETECTING REMOTE EXPLOITS
Introduction to Part IV
Detecting Remote Exploits
Introduction
Architecture
Related Work
Overview of Our Approach
Summary and Directions
References
Design of the Data Mining Tool
Introduction
DExtor Architecture
Disassembly
Feature Extraction
13.4.1 Useful Instruction Count (UIC)
13.4.2 Instruction Usage Frequencies (IUF)
13.4.3 Code vs. Data Length (CDL)
Combining Features and Compute Combined Feature Vector
Classification
Summary and Directions
References
Evaluation and Results
Introduction
Dataset
Experimental Setup
14.3.1 Parameter Settings
14.2.2 Baseline Techniques
Results
14.4.1 Running Time
Analysis
Robustness and Limitations
14.6.1 Robustness against Obfuscations
14.6.2 Limitations
Summary and Directions
References
Conclusion to Part IV
Part V: DATA MINING FOR DETECTING BOTNETS
Introduction to Part V
Detecting Botnets
Introduction
Botnet Architecture
Related Work
Our Approach
Summary and Directions
References
Design of the Data Mining Tool
Introduction
Architecture
System Setup
Data Collection
Bot Command Categorization
Feature Extraction
16.6.1 Packet-level Features
16.6.2 Flow-level Features
Log File Correlation
Classification
Packet Filtering
Summary and Directions
References
Evaluation and Results
Introduction
17.1.1 Baseline Techniques
17.1.2 Classifiers
Performance on Different Datasets
Comparison with Other Techniques
Further Analysis
Summary and Directions
References
Conclusion to Part V
STREAM MINING FOR SECURITY APPLICATIONS
Introduction to Part VI
Stream Mining
Introduction
Architecture
Related Work
Our Approach
Overview of the Novel Class Detection Algorithm
Classifiers Used
Security Applications
Summary
References
Design of the Data Mining Tool
Introduction
Definitions
Novel Class Detection
19.3.1 Saving the Inventory of Used Spaces during Training
19.3.1.1 Clustering
19.3.1.2 Storing the Cluster Summary Information
19.3.2 Outlier Detection and Filtering
19.3.2.1 Filtering
19.3.2.2 Detecting Novel Class
Security Applications
Summary and Directions
Reference
Evaluation and Results
Introduction
Datasets
20.2.1 Synthetic Data with Only Concept-Drift (SynC)
20.2.2 Synthetic Data with Concept-Drift and Novel Class (SynCN)
20.2.3 Real Data—KDDCup 99 Network Intrusion Detection
20.2.4 Real Data—Forest Cover (UCI Repository)
Experimental Setup
20.3.1 Baseline Method
Performance Study
20.4.1 Evaluation Approach
20.4.2 Results
20.4.3 Running Time
Summary and Directions
References
Conclusion for Part VI
EMERGING APPLICATIONS
Introduction to Part VII
Data Mining For Active Defense
Introduction
Related Work
Architecture
A Data Mining–Based Malware Detection Model
21.4.1 Our Framework
21.4.2 Feature Extraction
21.4.2.1 Binary n-Gram Feature Extraction
21.4.2.2 Feature Selection
21.4.2.3 Feature Vector Computation
21.4.3 Training
21.4.4 Testing
Model-Reversing Obfuscations
21.5.1 Path Selection
21.5.2 Feature Insertion
21.5.3 Feature Removal
Experiments
Summary and Directions
References
Data Mining for Insider Threat Detection
Introduction
The Challenges, Related Work, and Our Approach
Data Mining for Insider Threat Detection
22.3.1 Our Solution Architecture
22.3.2 Feature Extraction and Compact Representation
22.3.3 RDF Repository Architecture
22.3.4 Data Storage
22.3.4.1 File Organization
22.3.4.2 Predicate Split (PS)
22.3.4.3 Predicate Object Split (POS)
22.3.5 Answering Queries Using Hadoop MapReduce
22.3.6 Data Mining Applications
Comprehensive Framework
Summary and Directions
References
Dependable Real-Time Data Mining
Introduction
Issues in Real-Time Data Mining
Real-Time Data Mining Techniques
Parallel, Distributed, Real-Time Data Mining
Dependable Data Mining
Mining Data Streams
Summary and Directions
References
Firewall Policy Analysis
Introduction
Related Work
Firewall Concepts
24.3.1 Representation of Rules
24.3.2 Relationship between Two Rules
24.3.3 Possible Anomalies between Two Rules
Anomaly Resolution Algorithms
24.4.1 Algorithms for Finding and Resolving Anomalies
24.4.1.1 Illustrative Example
24.4.2 Algorithms for Merging Rules
24.4.2.1 Illustrative Example of the Merge Algorithm
Summary and Directions
References
Conclusion to Part VII
Summary and Directions
Overview
Summary of This Book
Directions for Data Mining Tools for Malware Detection
Where Do We Go from Here?
Appendix A: Data Management Systems: Developments and Trends
Overview
Developments in Database Systems
Status, Vision, and Issues
Data Management Systems Framework
Building Information Systems from the Framework
Relationship between the Texts
Summary and Directions
References
Appendix B: Trustworthy Systems
Overview
Secure Systems
B.2.1 Overview
B.2.2 Access Control and Other Security Concepts
B.2.3 Types of Secure Systems
B.2.4 Secure Operating Systems
B.2.5 Secure Database Systems
B.2.6 Secure Networks
B.2.7 Emerging Trends
B.2.8 Impact of the Web
B.2.9 Steps to Building Secure Systems
Web Security
Building Trusted Systems from Untrusted Components
Dependable Systems
B.5.1 Overview
B.5.2 Trust Management
B.5.3 Digital Rights Management
Overview
Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for ...