picture Frank DiMaio

Mailing address:   Biochemistry Department
Box 357350
University of Washington
Seattle, WA 98195
Home address:   4038 Stone Way N. #303
Seattle, WA 98103
(206) 632-7556
E-mail: dimaio@u.washington.edu

Contents


News

I have accepted a postdoctoral position at the University of Washington, in David Baker's group.

Research Interests

My main research interest is the application of techniques from machine learning and computer vision to new and open biomedical problems. My current research has employed several different statistical inference methods in identification of molecular images produced from x-ray crystallography. I am interested in applying such statistical models to other domains, including ab initio protein folding, protein-ligand binding, and pattern recognition in other 3D images. Additionally, I am interested in scaling probabilistic inference methods to handle extremely large problem domains.


Graduate Research Project

In collaboration with my advisor Dr. Jude Shavlik and Dr. George Phillips of the University of Wisconsin-Madison Biochemistry Department, I have been investigating automatically identifying protein structures in electron density maps. Analogous to a 3-dimensional picture of a protein, the electron density map is produced as the final result of x-ray crystallography. Tracing the proteins in these complex 3D images, or interpreting these maps, is often time consuming, requiring a crystallographer spend weeks to months tediously placing each atom. My work employs probabilistic inference to determine the most likely trace given an electron density map. A two-phased approach first lays down the backbone - a simplified representation of the protein - on a coarse grid, then places down each individual atom in real space, using the initial trace as a guide.

The algorithms developed have been included in the software suite ACMI.

An overview of electron density map interpretation. Given the amino acid sequence of the protein and a density map, the crystallographer’s goal is to find the positions of all the proteins' atoms.
Click to animate!

Markov field model for protein backbone tracing

To initially place the protein backbone, I model the protein using a pairwise Markov random field (MRF). A pairwise MRF defines the joint probability of some set of random variables over an undirected graph. In this case, nodes in the graph represent amino acids in the protein, while the random variables describe the 3D location of each amino acid's alpha carbon (a key atom present in each amino acid). Associated with each amino acid is the probability of finding that amino acid in a particular location given the map. Edges connect all pairs of amino acids, and enforce constraints on the relative positions of each amino acid pair: two adjacent alpha carbons are always the same distance apart, while two nonadjacent amino acids may not occupy the same space. Associated with each edge is the probability of observing the pair in a specific conformation.

Determining the most likely backbone trace given some electron density map, then, requires inferring the marginal distribution of each amino acid's position (that is, the distribution of one amino acid's position summing over all possible positions of all other amino acids). However, few MRF inference methods can handle - even in an approximate sense - graphs with loops; fewer still can handle graphs with possibly several thousand vertices. Belief propagation (BP) is a technique that performs approximate inference in loopy graphs, however, it does not scale to thousand-residue proteins. To make BP tractable in these types of graphs, I have developed AggBP, which approximates some subset of outgoing messages at a single node with a single message, and makes BP tractable for large proteins. On a variety of maps, my method produces a more accurate backbone trace than two other commonly used methods.

A sample inferred structure. The predicted structure is shown in green, while the true (crystallographer-determined) structure is shown in black. Notice this is only a small portion of the entire protein, about 25 amino acids in length.
Click to animate!

Particle filtering for all-atom placement

While AggBP infers an accurate backbone trace, there are several shortcomings. First, biologists are most interested in not just the position of each alpha carbon, but rather the location of every single atom in the protein. Second, by choosing the most likely grid point for each alpha carbon, we get a protein trace that may not be physically feasible, as the protein's interatomic distances are known to much greater accuracy than the grid spacing. Finally, even taking grid effects into account, the approximate marginal distributions computed by AggBP may give physically infeasible traces.
To address these shortcomings, and produce the most likely physically feasible all-atom trace (or set of traces), I have investigated the use of particle filtering (PF) for all-atom placement. Particle filtering approximates some probability distribution as the sum of a finite number of weighted point estimates. For all-atom placement, each point estimate is a single all-atom partial trace. At each PF iteration, each trace is grown by forward sampling from the distribution of torsion angles, then weighing each trace by its likelihood given the map. My key contribution is to use the previously computed marginals in the sampling distribution; that is, I sample the next residue's position from the product of the distribution of torsion angles and the next residue's marginal. Using the marginals to guide sampling requires significantly fewer particles to recover an accurate trace. Preliminary results using this method are very promising, further improving the accuracy from the backbone trace while returning a physically feasible interpretation.

Particle filtering to recover a set of all-atom models Particle filtering represents the posterior probability of a protein's configuration using a finite set of point estimates, or "particles".


Publications

  • F. DiMaio, D. Kondrashov, E. Bitto, A. Soni, C. Bingman, G. Phillips & J. Shavlik (2007). Creating Protein Models from Electron-Density Maps using Particle-Filtering Methods. Bioinformatics. doi: 10.1093/bioinformatics/btm480.
    (Get the software!, pdf)

  • F. DiMaio, A. Soni, G. Phillips & J. Shavlik (2007). Improved Methods for Template-Matching in Electron-Density Maps Using Spherical Harmonics. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07), Fremont, CA.
    (pdf)

  • F. DiMaio and J. Shavlik (2006). Belief propagation in large, highly connected graphs for 3D part-based object recognition. Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM), Hong Kong.
    (pdf, ppt slides)

  • F. DiMaio and J. Shavlik (2006). Improving the efficiency of belief propagation in large highly connected graphs. University of Wisconsin-Madison Machine Learning Research Group Working Paper 06-1.
    (pdf)

  • F. DiMaio, J. Shavlik and G. Phillips (2006). A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 22; also presented at the Fourteenth International Conference on Intelligent Systems for Molecular Biology (ISMB), Fortaleza, Brazil.
    (pdf, ppt slides)

  • F. DiMaio, J. Shavlik and G. Phillips (2005). Pictorial structures for molecular modeling: Interpreting density maps. Advances in Neural Information Processing Systems (NIPS) 17, Vancouver, Canada.
    (pdf, poster)

  • F. DiMaio and J. Shavlik (2004). Learning an approximation to inductive logic programming clause evaluation. Proceedings of the Fourteenth International Conference on Inductive Logic Programming, Porto, Portugal.
    (pdf, ppt slides)

  • D. Gopan, F. DiMaio, N. Dor, T. Reps and M. Sagiv (2004). Numeric domains with summarized dimensions. Proceedings of Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Barcelona, Spain.

Workshop Publications

  • F. DiMaio and J. Shavlik (2003). Speeding up relational data mining by learning to estimate candidate hypothesis scores. Proceedings of the ICDM Workshop on Foundations and New Directions of Data Mining, Melbourne, Florida.
    (pdf)

  • F. DiMaio, J. Shavlik and G. Phillips (2003). Using pictorial structures to identify proteins in x-ray crystallographic electron density maps. Working Notes of the ICML Workshop on Machine Learning in Bioinformatics, Washington, DC.
    (pdf)


Posters and Presentations (without corresponding publication)

  • F. DiMaio (2007). Guiding particle filtering with marginal approximations: An application in protein image interpretation. The Learning Workshop, San Juan, Puerto Rico.
    (ppt slides)

  • F. DiMaio (2007). New approaches to automatic fitting of electron density maps. PSI Protein Production and Crystallization Workshop, Bethesda, Maryland.
    (poster)

  • F. DiMaio (2006). Modeling protein backbones with pairwise Markov fields. ISMB Satellite Meeting on Structural Bioinformatics and Computational Biophysics (3Dsig), Fortaleza, Brazil.
    (ppt slides)

  • F. DiMaio (2006). Tracing protein backbones in electron density maps using a Markov random field model. Snowbird Learning Workshop, Snowbird, Utah.
    (ppt slides, poster)

  • F. DiMaio, J. Shavlik, and G. Phillips (2005). Automated protein backbone tracing in electron density maps using belief propagation. ISMB Poster Session, Detroit, Michigan.
    (poster)

  • F. DiMaio (2004). Extending pictorial structures for the interpretation of crystallographic density maps. National Library of Medicine Training Directors' Meeting, Indianapolis, Indiana.
    (ppt slides)