Research Interests
My main research interest is the application of techniques from machine learning
and computer vision to new and open biomedical problems. My current research has
employed several different statistical inference methods in identification of molecular
images produced from x-ray crystallography. I am interested in applying such statistical
models to other domains, including ab initio protein folding, protein-ligand binding,
and pattern recognition in other 3D images. Additionally, I am interested in scaling
probabilistic inference methods to handle extremely large problem domains.
Graduate Research Project
|
In collaboration with my advisor Dr. Jude Shavlik and Dr. George Phillips of the University of
Wisconsin-Madison Biochemistry Department, I have been investigating automatically identifying
protein structures in electron density maps. Analogous to a 3-dimensional picture of a protein,
the electron density map is produced as the final result of x-ray crystallography. Tracing the
proteins in these complex 3D images, or interpreting these maps, is often time consuming,
requiring a crystallographer spend weeks to months tediously placing each atom. My work employs
probabilistic inference to determine the most likely trace given an electron density map.
A two-phased approach first lays down the backbone - a simplified representation of the protein
- on a coarse grid, then places down each individual atom in real space, using the initial trace
as a guide.
The algorithms developed have been included in the software suite
ACMI.
|
|
An overview of electron density map interpretation.
Given the amino acid sequence of the protein and a density map,
the crystallographer’s goal is to find the positions of all the proteins' atoms. Click to animate!
|
Markov field model for protein backbone tracing
To initially place the protein backbone, I model the protein using a pairwise Markov random field (MRF).
A pairwise MRF defines the joint probability of some set of random variables over an undirected graph.
In this case, nodes in the graph represent amino acids in the protein, while the random variables
describe the 3D location of each amino acid's alpha carbon (a key atom present in each amino acid).
Associated with each amino acid is the probability of finding that amino acid in a particular location
given the map. Edges connect all pairs of amino acids, and enforce constraints on the relative positions
of each amino acid pair: two adjacent alpha carbons are always the same distance apart, while two
nonadjacent amino acids may not occupy the same space. Associated with each edge is the probability
of observing the pair in a specific conformation.
|
Determining the most likely backbone trace given some electron density map, then,
requires inferring the marginal distribution of each amino acid's position
(that is, the distribution of one amino acid's position summing over all possible
positions of all other amino acids). However, few MRF inference methods can handle -
even in an approximate sense - graphs with loops; fewer still can handle graphs with
possibly several thousand vertices. Belief propagation (BP) is a technique that performs
approximate inference in loopy graphs, however, it does not scale to thousand-residue proteins.
To make BP tractable in these types of graphs, I have developed AggBP, which approximates
some subset of outgoing messages at a single node with a single message, and makes BP
tractable for large proteins. On a variety of maps, my method produces a more accurate
backbone trace than two other commonly used methods.
|
|
A sample inferred structure.
The predicted structure is shown in green, while the true (crystallographer-determined) structure is shown in black.
Notice this is only a small portion of the entire protein, about 25 amino acids in length. Click to animate!
|
Particle filtering for all-atom placement
While AggBP infers an accurate backbone trace, there are several shortcomings.
First, biologists are most interested in not just the position of each alpha carbon,
but rather the location of every single atom in the protein. Second, by choosing the
most likely grid point for each alpha carbon, we get a protein trace that may not be
physically feasible, as the protein's interatomic distances are known to much greater
accuracy than the grid spacing. Finally, even taking grid effects into account, the
approximate marginal distributions computed by AggBP may give physically infeasible traces.
|
To address these shortcomings, and produce the most likely physically feasible all-atom
trace (or set of traces), I have investigated the use of particle filtering (PF) for all-atom
placement. Particle filtering approximates some probability distribution as the sum of a finite
number of weighted point estimates. For all-atom placement, each point estimate is a single all-atom
partial trace. At each PF iteration, each trace is grown by forward sampling from the distribution of
torsion angles, then weighing each trace by its likelihood given the map. My key contribution is to
use the previously computed marginals in the sampling distribution; that is, I sample the next
residue's position from the product of the distribution of torsion angles and the next residue's
marginal. Using the marginals to guide sampling requires significantly fewer particles to recover
an accurate trace. Preliminary results using this method are very promising, further improving the
accuracy from the backbone trace while returning a physically feasible interpretation.
| |
Particle filtering to recover a set of all-atom models
Particle filtering represents the posterior probability of a protein's configuration using a finite set of point estimates, or "particles".
|
|
Publications
- F. DiMaio, D. Kondrashov, E. Bitto, A. Soni, C. Bingman,
G. Phillips & J. Shavlik (2007).
Creating Protein Models from Electron-Density Maps using Particle-Filtering Methods.
Bioinformatics. doi: 10.1093/bioinformatics/btm480.
(Get the software!,
pdf)
- F. DiMaio, A. Soni, G. Phillips & J. Shavlik (2007).
Improved Methods for Template-Matching in Electron-Density Maps Using Spherical Harmonics.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07),
Fremont, CA.
(pdf)
- F. DiMaio and J. Shavlik (2006).
Belief propagation in large, highly connected graphs for 3D part-based object recognition.
Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM), Hong Kong.
(pdf,
ppt slides)
- F. DiMaio and J. Shavlik (2006).
Improving the efficiency of belief propagation in large highly connected graphs.
University of Wisconsin-Madison Machine Learning Research Group Working Paper 06-1.
(pdf)
- F. DiMaio, J. Shavlik and G. Phillips (2006).
A probabilistic approach to protein backbone tracing in electron density maps.
Bioinformatics 22; also presented at the
Fourteenth International Conference on Intelligent Systems for Molecular Biology (ISMB),
Fortaleza, Brazil.
(pdf,
ppt slides)
- F. DiMaio, J. Shavlik and G. Phillips (2005).
Pictorial structures for molecular modeling: Interpreting density maps.
Advances in Neural Information Processing Systems (NIPS) 17, Vancouver, Canada.
(pdf,
poster)
- F. DiMaio and J. Shavlik (2004).
Learning an approximation to inductive logic programming clause evaluation.
Proceedings of the Fourteenth International Conference on Inductive Logic Programming,
Porto, Portugal.
(pdf,
ppt slides)
- D. Gopan, F. DiMaio, N. Dor, T. Reps and M. Sagiv (2004).
Numeric domains with summarized dimensions.
Proceedings of Tools and Algorithms for the Construction and Analysis of Systems (TACAS),
Barcelona, Spain.
Workshop Publications
- F. DiMaio and J. Shavlik (2003).
Speeding up relational data mining by learning to estimate candidate hypothesis scores.
Proceedings of the ICDM Workshop on Foundations and New Directions of Data Mining,
Melbourne, Florida.
(pdf)
- F. DiMaio, J. Shavlik and G. Phillips (2003).
Using pictorial structures to identify proteins in x-ray crystallographic electron density maps.
Working Notes of the ICML Workshop on Machine Learning in Bioinformatics, Washington, DC.
(pdf)
|
Posters and Presentations (without corresponding publication)
- F. DiMaio (2007).
Guiding particle filtering with marginal approximations: An application in protein image interpretation.
The Learning Workshop, San Juan, Puerto Rico.
(ppt slides)
- F. DiMaio (2007).
New approaches to automatic fitting of electron density maps.
PSI Protein Production and Crystallization Workshop, Bethesda, Maryland.
(poster)
- F. DiMaio (2006).
Modeling protein backbones with pairwise Markov fields.
ISMB Satellite Meeting on Structural Bioinformatics and Computational Biophysics (3Dsig),
Fortaleza, Brazil.
(ppt slides)
- F. DiMaio (2006).
Tracing protein backbones in electron density maps using a Markov random field model.
Snowbird Learning Workshop, Snowbird, Utah.
(ppt slides,
poster)
- F. DiMaio, J. Shavlik, and G. Phillips (2005).
Automated protein backbone tracing in electron density maps using belief propagation.
ISMB Poster Session, Detroit, Michigan.
(poster)
- F. DiMaio (2004).
Extending pictorial structures for the interpretation of crystallographic density maps.
National Library of Medicine Training Directors' Meeting, Indianapolis, Indiana.
(ppt slides)
|