M. Molla, P. Andreae & J. Shavlik (2004).
Building Genome Expression Models using Microarray Expression Data and Text. Department of Computer Sciences, University of Wisconsin, Machine Learning Research Group Working Paper 04-1.
This publication is available in PDF and available in Microsoft Word.
Microarray expression data is being generated by the gigabyte all over the world with undoubted exponential increases to come. Annotated genomic data is also rapidly pouring into public databases. Our goal is to develop automated ways of combining these two sources of information to produce insight into the operation of cells under various conditions. Our approach is to use machine-learning techniques to identify characteristics of genes that are up-regulated or down-regulated in a particular microarray experiment. We seek models that are both accurate and easy to interpret. This paper explores the effectiveness of two algorithms for this task: PFOIL (a standard machine-learning rule-building algorithm) and GORB (a new rule-building algorithm diviseddevised by us). We use a permutation test to evaluate the statistical significancequality of the learned models. The paper reports on experiments using actual E. coli microarray data, discussing the strengths and weaknesses of the two algorithms and demonstrating the trade-offs between accuracy and comprehensibility.
Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison
INFORMATION ~ PEOPLE ~ GRADS ~ UNDERGRADS ~ RESEARCH ~ RESOURCES
5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706
email@example.com ~ voice: 608-262-1204 ~ fax: 608-262-9777