University of Wisconsin Computer Sciences Header Map (repeated with 
textual links if page includes departmental footer) Useful Resources Research at UW-Madison CS Dept UW-Madison CS Undergraduate Program UW-Madison CS Graduate Program UW-Madison CS People Useful Information Current Seminars in the CS Department Search Our Site UW-Madison CS Computer Systems Laboratory UW-Madison Computer Sciences Department Home Page UW-Madison Home Page

M. Molla, P. Andreae, J. Glasner, F. Blattner & J. Shavlik (2002).
Interpreting Microarray Expression Data Using Text Annotating the Genes. Information Sciences, 146, pp. 75-88.
Also appears in: Proceedings of the 4th Conference on Computational Biology and Genome Informatics, Durham, NC
Slides (PPT).

This publication is available in PDF and available in Microsoft Word.

The slides for this publication are available in Microsoft PowerPoint.


Microarray expression data is being generated by the gigabyte all over the world with undoubted exponential increases to come. Annotated genomic data is also rapidly pouring into public databases. Our goal is to develop automated ways of combining these two sources of information to produce insight into the operation of cells under various conditions. Our approach is to use machine-learning techniques to identify characteristics of genes that are up-regulated or down-regulated in a particular microarray experiment. We seek models that are (a) accurate, (b) easy to interpret, and (c) stable to small variations in the training data. This paper explores the effectiveness of two standard machine-learning algorithms for this task: Na e Bayes (based on probability) and PFOIL (based on building rules). Although we do not anticipate using our learned models to predict expression levels of genes, we cast the task in a predictive framework, and evaluate the quality of the models in terms of their predictive power on genes held out from the training. The paper reports on experiments using actual E. coli microarray data, discussing the strengths and weaknesses of the two algorithms and demonstrating the trade-offs between accuracy, comprehensibility, and stability.

return Return to the publications of the Univ. of Wisconsin Machine Learning Research Group.

Computer Sciences Department
College of Letters and Science
University of Wisconsin - Madison


5355a Computer Sciences and Statistics ~ 1210 West Dayton Street, Madison, WI 53706 ~ voice: 608-262-1204 ~ fax: 608-262-9777