Life Science Data Mining
by Stephen WongThis timely book identifies and highlights the latest data mining paradigms to analyze, combine, integrate, model and simulate vast amounts of heterogeneous multi-modal, multi-scale data for emerging real-world applications in life science. The cutting-edge topics presented include bio-surveillance, disease outbreak detection, high throughput bioimaging, drug… See more details below
Overview
This timely book identifies and highlights the latest data mining paradigms to analyze, combine, integrate, model and simulate vast amounts of heterogeneous multi-modal, multi-scale data for emerging real-world applications in life science. The cutting-edge topics presented include bio-surveillance, disease outbreak detection, high throughput bioimaging, drug screening, predictive toxicology, biosensors, and the integration of macro-scale bio-surveillance and environmental data with micro-scale biological data for personalized medicine. This collection of works from leading researchers in the field offers readers an exceptional start in these areas.
Product Details
- ISBN-13:
- 9789812700650
- Publisher:
- World Scientific Publishing Company, Incorporated
- Publication date:
- 12/28/2006
- Series:
- SCIENCE, ENGINEERING, and BIOLOGY INFORMATICS Series
- Pages:
- 388
- Product dimensions:
- 6.00(w) x 9.00(h) x 0.90(d)
Table of Contents
Preface v
Survey of Early Warning Systems for Environmental and Public Health Applications 1
Introduction 1
Disease Surveillance 3
Reference Architecture for Model Extraction 5
Problem Domain 9
Data Sources 10
Detection Methods 12
Summary and Conclusion 13
References 14
Time-Lapse Cell Cycle Quantitative Data Analysis Using Gaussian Mixture Models 17
Introduction 18
Material and Feature Extraction 20
Material and cell feature extraction 20
Model the time-lapse data using AR model 23
Problem Statement and Formulation 24
Classification Methods 26
Gaussian mixture models and the EM algorithm 26
K-Nearest Neighbor (KNN) classifier 28
Neural networks 28
Decision tree 29
Fisher clustering 30
Experimental Results 30
Trace identification 31
Cell morphologic similarity analysis 33
Phase identification 35
Cluster analysis of time-lapse data 37
Conclusion 40
Appendix A 41
Appendix B 42
References 43
Diversity and Accuracy of Data Mining Ensemble 47
Introduction 47
Ensemble and Diversity 49
Why needs diversity? 49
Diversity measures 51
Probability Analysis 52
Coincident Failure Diversity 52
Ensemble Accuracy 55
Relationship between random guess and accuracy of lower bound single models 55
Relationship between accuracy A and the number of models N 56
When model's accuracy [Less than] 50% 57
Construction of Effective Ensembles 58
Strategies for increasing diversity 59
Ensembles of neural networks 60
Ensembles of decision trees 61
Hybrid ensembles 62
An Application: Osteoporosis Classification Problem 62
Osteoporosis problem 63
Results from the ensembles of neural nets 63
Results from ensembles of the decision trees 66
Results of hybrid ensembles 67
Discussion and Conclusions 68
References 70
Integrated Clustering for Microarray Data 73
Introduction 73
Related Work 77
Data Preprocessing 81
Integrated Clustering 83
Clustering algorithms 83
Integration methodology 88
Experimental Evaluation 89
Evaluation methodology 89
Results 91
Discussion 93
Conclusions 94
References 94
Complexity and Synchronization of EEG with Parametric Modeling 99
Introduction 100
Brief review of EEG recording analysis 100
AR modeling based EEG analysis 101
TVAR Modeling 104
Complexity Measure 105
Synchronization Measure 109
Conclusions 113
References 114
Bayesian Fusion of Syndromic Surveillance with Sensor Data for Disease Outbreak Classification 119
Introduction 120
Approach 122
Bayesian belief networks 122
Syndromic data 126
Environmental data 128
Test scenarios 130
Evaluation metrics 130
Results 131
Scenario 1 131
Scenario 2 134
Promptness 135
Summary and Conclusions 136
References 137
An Evaluation of Over-the-Counter Medication Sales for Syndromic Surveillance 143
Introduction 143
Background and Related Work 144
Data 144
Approaches 145
Lead-lag correlation analysis 145
Regression test of predictive ability 146
Detection-based approaches 148
Supervised algorithm for outbreak detection in OTC data 148
Modified Holt-Winters forecaster 150
Forecasting based on multi-channel regression 151
Experiments 153
Lead-lag correlation analysis of OTC data 153
Regression test of the predicative value of OTC 154
Results from detection-based approaches 156
Conclusions and Future Work 158
References 159
Collaborative Health Sentinel 163
Introduction 163
Infectious Disease and Existing Health Surveillance Programs 166
Elements of the Collaborative Health Sentinel (CHS) System 170
Sampling 170
Creating a national health map 177
Detection 177
Reaction 183
Cost considerations 184
Interaction with the Health Information Technology (HCIT) World 185
Conclusion 188
References 189
HL7 192
A Multi-Modal System Approach for Drug Abuse Research and Treatment Evaluation: Information Systems Needs and Challenges 195
Introduction 195
Context 198
Data sources 198
Examples of relevant questions 199
Possible System Structure 201
Challenges in System Development and Implementation 204
Ontology development 204
Data source control, proprietary issues 205
Privacy, security issues 205
Costs to implement/maintain system 206
Historical hypothesis-testing paradigm 206
Utility, usability, credibility of such a system 206
Funding of system development 207
Summary 207
References 208
Knowledge Representation for Versatile Hybrid Intelligent Processing Applied in Predictive Toxicology 213
Introduction 214
Hybrid Intelligent Techniques for Predictive Toxicology Knowledge Representation 217
XML Schemas for Knowledge Representation and Processing in AI and Predictive Toxicology 218
Towards a Standard for Chemical Data Representation in Predictive Toxicology 220
Hybrid Intelligent Systems for Knowledge Representation in Predictive Toxicology 225
A formal description of implicit and explicit knowledge-based intelligent systems 226
An XML schema for hybrid intelligent systems 228
A Case Study 231
Materials and methods 232
Results 233
Conclusions 235
References 236
Ensemble Classification System Implementation for Biomedical Microarray Data 239
Introduction 240
Background 241
Reasons for ensemble 241
Diversity and ensemble 241
Relationship between measures of diversity and combination method 243
Measures of diversity 243
Microarray data 244
Ensemble Classification System (ECS) Design 245
ECS overview 245
Feature subset selection 247
Base classifiers 248
Combination strategy 249
Experiments 250
Experimental datasets 250
Experimental results 252
Conclusion and Further Work 254
References 255
An Automated Method for Cell Phase Identification in High Throughput Time-Lapse Screens 257
Introduction 258
Nuclei Segmentation and Tracking 259
Cell Phase Identification 260
Feature calculation 260
Identifying cell phase 262
Correcting cell phase identification errors 265
Experimental Results 266
Conclusion 272
References 272
Inference of Transcriptional Regulatory Networks Based on Cancer Microarray Data 275
Introduction 275
Subnetworks and Transcriptional Regulatory Networks Inference 277
Inferring subnetworks using z-score 277
Inferring subnetworks based on graph theory 278
Inferring subnetworks based on Bayesian networks 279
Inferring transcriptional regulatory networks based on integrated expression and sequence data 283
Multinomial Probit Regression with Baysian Gene Selection 284
Problem formulation 284
Bayesian variable selection 286
Bayesian estimation using the strongest genes 288
Experimental results 289
Network Construction Based on Clustering and Predictor Design 293
Predictor construction using reversible jump MCMC annealing 293
CoD for predictors 295
Experimental results on a Myeloid line 296
Concluding Remarks 298
References 299
Data Mining in Biomedicine 305
Introduction 305
Predictive Model Construction 306
Derivation of unsupervised models 307
Derivation of supervised models 311
Validation 316
Impact Analysis 318
Summary 319
References 319
Mining Multilevel Association Rules from Gene Ontology and Microarray Data 321
Introduction 321
Proposed Methods 323
Preprocessing 323
Hierarchy-information encoding 324
The MAGO Algorithm 326
MAGO algorithm 327
CMAGO (Constrained Multilevel Association rules with Gene Ontology) 329
Experimental Results 330
The characteristic of the dataset 331
Experimental results 331
Interpretation 334
Concluding Remarks 335
References 336
A Proposed Sensor-Configuration and Sensitivity Analysis of Parameters with Applications to Biosensors 339
Introduction 340
Sensor-System Configuration 342
Optical Biosensors 346
Relationship between parameters 347
Modelling of parameters 351
Discussion 356
Conclusion 358
References 359
Epilogue 361
References 364
Index 365
Customer Reviews
Average Review: