Source code for our cis-regulatory module (CRM)-finding alogorithms can be found below.
Yeast data from Lee et al. (Science, 2002) can be downloaded
here.
Both algorithms learn a CRM as a set of motifs and a logical and
spatial relationship among them.
The learned CRM distinguishes between a set of positive DNA sequence examples
(e.g. promoters of interest) and a set of negative (control)
sequences.
Key differences among these algorithms are listed below for each (see papers
for details). In either case, the user may specify which logical and
spatial aspects are allowed for a learned CRM.
Contact:
notocs.wisc.edu
(Keith Noto, University of Wisconsin-Madison)
Noto and Craven,
Learning Probabilistic Models of cis
-Regulatory Modules that Represent Logical and Spatial Aspects,
European Conference on Computational Biology (ECCB) 2006
(
PDF)
Key points of this algorithm:
- Learns motif (PWMs) de novo.
- Learns spatial preferences instead of hard constraints.
For example, it learns a smooth probability distribution over possible distances between
adjacent motifs instead of a maximum allowable distance constraint.
- In fact, the algorithm learns a generalized hidden Markov model
(HMM) representation of a CRM, and learns both model structure (number of
motifs and logical relationship among them) and parameters (motif PWMs and
spatial preferences).
Download:
Noto and Craven,
A Specialized Learner for Inferring Structured cis
-Regulatory Modules,
BMC Bioinformatics, 2006
(
PDF)
Key points of this algorithm:
- Selects the relevant CRM motifs from a given set of candidate motifs.
These candidates are defined as position weight matrices (PWMs) and
may come from a database or suggested motifs from a motif-learning algorithm.
- Learns constraints on the spatial relationships among motifs.
For example, a CRM may include a maximum distance (in base pairs) between motifs.
Download: