picture
Frank DiMaio
Postdoctoral Researcher, Baker laboratory
Mailing address:   Biochemistry Department, Box 357350
University of Washington, Seattle, WA 98195
Home address:   4038 Stone Way N. #303
Seattle, WA 98103
E-mail: dimaio@u.washington.edu
Download: CV

Research Interests

My research interests center on using machine learning and probabilistic inference to solve challenging problems, with an emphasis on tasks in molecular biology. These problems, ranging from interpretation of medical images to modeling gene expression profiles as they change over time, are well suited to probabilistic inference, as they involve both: (a) disambiguating many sources of noisy observations, as well as (b) incorporation of often large sets of background domain knowledge.

In particular, I have focused on determination of a protein's three-dimensional structure, providing both key insights into function as well as targets for drug design. Many proteins and protein complexes of biomedical importance elude traditional structure determination attempts; however, for many of these proteins it is possible to collect sparse experimental data. My work has shown that machine learning methods -- in particular, approximate probabilistic inference -- can significantly help in structure determination from sparse experimental data. By developing novel inference methods that can make use of the wealth of background data on protein structures known from biological and chemical knowledge, I aim to further improve structure determination from sparse experimental data.


Research Projects

Increasing the radius of convergence of molecular replacement by density- and energy-guided optimization

The crystallographic phase problem refers to the fact that when X-ray diffraction data is collected, additional data -- the "phases" -- are needed to construct a map of the protein's density. Molecular replacement (MR) is a method in which a previously solved protein structure (the "template") is used to fill in this missing experimental information for a target protein. The method generally works, assuming template and target are reasonably similar. However, when the template and target have less than 30% sequence identity, molecular replacement often will fail.

I show that the crystallographic phase problem can be solved using distant evolutionary relationships by combining algorithms for protein structure modelling with those developed for crystallographic structure determination. Integrating Rosetta structure modelling with Autobuild chain tracing yielded high-resolution structures for 8 of 13 X-ray diffraction data sets that could not be solved in the laboratories of expert crystallographers, and that remained unsolved after application of an extensive array of alternative approaches. The method shows a 50% success rate in cases where templates with 16-30% sequence identity and 70%+ coverage are available.

Source code has been included in the Rosetta and Phenix software packages.


Inferring protein backbones with Markov field models

I have been investigating automatically tracing protein structures de novo into electron density maps. Analogous to a 3-dimensional picture of a protein, the electron density map is produced as the final result of X-ray crystallographic experiments. Tracing protein backbones in these complex 3D images manually is time consuming and labor-intensive. My work employs probabilistic inference to determine the most likely trace given an electron density map.

I model the protein using a pairwise Markov random field (MRF), which defines the joint probability of some set of random variables on an undirected graph. In this case, graph vertices represent amino acids in the protein with associated random variables describing the 3D location and orientation of each amino acid. Associated with each amino acid is a probability of finding that amino acid in a particular location given the map. Determining the most likely backbone trace given some electron density map infers the marginal distribution of each amino acid's position. Belief propagation (BP) is a technique that performs approximate inference in loopy graphs, however, it does not scale to proteins which may contain thousands of residues. To make BP tractable in these types of graphs, I have developed AggBP, which approximates some subset of outgoing messages at a single node with a single message, making the method computationally feasible for large proteins. On a variety of maps, my method produces a more accurate backbone trace than two other commonly used methods.

The algorithms developed have been included in the software suite ACMI.


Determining protein structures from sparse experimental data

The structure of many biomedically important proteins eludes traditional structure determination methods. However, for many of these proteins it may be possible to collect sparse experimental data. While these sources of data may not be enough to uniquely determine the structure of protein, they do contain enough information to guide conformational search in structure prediction methods.

By providing sparse experimental information to structure prediction methods, I am able to generate significantly better models using less conformational sampling. In particular, I showed my method could be used to infer high-resolution structural details from sparse experimental data using cryo-electron microscopy (cryoEM) data, a method by which individual images of tens of thousands of protein molecules are reconstructed into a three-dimensional "envelope". Additionally, I have also shown that other sources of weak data -- including those from NMR experiments and small-angle X-ray scattering -- may be used in a similar manner.


Selected publications (download CV)

  • T. Terwilliger, F. DiMaio, R. Read, D. Baker, G. Bunkoczi, P. Adams, R. Grosse-Kunstleve, P. Afonine, N, Echols. (2012) phenix.mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta. J Struct Funct Genomics.

  • F. DiMaio, T. Terwilliger, R. Read, A. Wlodawer, G. Oberdorfer, E. Valkov, A. Alon, D. Fass, H. Axelrod, D. Das, S. Vorobiev, H. Iwai, P. Pokkuluri and D. Baker (2011). Increasing the radius of convergence of molecular replacement by density and energy guided protein structure optimization. Nature 473: 540-543. (pdf)

  • M. Li, F. DiMaio, D. Zhou, A. Gustchina, J. Lubkowski, Z. Dauter, D. Baker and A. Wlodawer (2011). Crystal structure of XMRV protease differs from the structures of other retropepsins. Nature Structural & Molecular Biology. 18:227-9. (pdf)

  • F. Khatib, F. DiMaio, Foldit Contenders Group, Foldit Void Crushers Group, S. Cooper, M. Kazmierczyk, M. Gilski, S. Krzywda, H. Zabranska, I. Pichova, J. Thompson, Z. Popovic, M. Jaskolski, D. Baker (2011). Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat Struct Mol Biol. (pdf)

  • J. Zhang, B. Ma, F. DiMaio, N. Douglas, L. Joachimiak, D. Baker, J. Frydman, M. Levitt, and W. Chiu (2011). Cryo-EM Structure of a Group II Chaperonin in the Prehydrolysis ATP-Bound State Leading to Lid Closure. Structure. 19:633-9.

  • N. Sgourakis, O. Lange, F. DiMaio, I. Andre, N. Fitzkee, P. Rossi, G. Montelione, A. Bax, and D. Baker. (2011). Determination of the structures of symmetric protein oligomers from NMR chemical shifts and residual dipolar couplings. Journal of the American Chemical Society. 133:6288-6298.(pdf)

  • F. DiMaio, A. Leaver-Fay, P. Bradley, D. Baker, I. Andre. (2011) Modeling symmetric macromolecular structures in Rosetta3. PLoS One.

  • E. Valkov, A. Stamp, F. DiMaio, D. Baker, B. Verstak, P. Roversi, S. Kellie, M. Sweet, A. Mansell, N. Gay, J. Martin, B. Kobe (2011). Crystal structure of Toll-like receptor adaptor MAL/TIRAP reveals the molecular basis for signal transduction and disease protection. Proc Natl Acad Sci U S A. 108:14879-84.

  • M. Tyka, D. Keedy, I. Andre, F. DiMaio, Y. Song, D. Richardson, J. Richardson and D. Baker (2010). Alternate states of proteins revealed by detailed energy landscape mapping. Journal of Molecular Biology. 405:607-18.(pdf)

  • D.-H. Chen, M. Baker, C. Hryc, F. DiMaio, J. Jakana, W. Wu, M. Dougherty, C. Haase-Pettingell, M. Schmid, W. Jiang, D. Baker, J. King and W. Chiu (2010). Structural basis for scaffolding-mediated assembly and maturation of a dsDNA virus. Proc Natl Acad Sci U S A. 108:1355-60. (pdf)

  • F. DiMaio, M. Tyka, M. Baker, W. Chiu and D. Baker (2009). Refinement of protein structures into low-resolution density maps using Rosetta. Journal of Molecular Biology 392: 181-190. (pdf)

  • F. DiMaio, A. Soni, G. Phillips and J. Shavlik (2009). Spherical-Harmonic Decomposition for Molecular Recognition in Electron-Density Maps. Int. J. of Data Mining and Bioinformatics 3: 205-227. (pdf)

  • F. DiMaio, D. Kondrashov, E. Bitto, A. Soni, C. Bingman, G. Phillips & J. Shavlik (2007). Creating Protein Models from Electron-Density Maps using Particle-Filtering Methods. Bioinformatics. (Get the software!, pdf)

  • F. DiMaio, A. Soni, G. Phillips & J. Shavlik (2007). Improved Methods for Template-Matching in Electron-Density Maps Using Spherical Harmonics. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07), Fremont, CA. (pdf)

  • F. DiMaio and J. Shavlik (2006). Belief propagation in large, highly connected graphs for 3D part-based object recognition. Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM), Hong Kong. (pdf)

  • F. DiMaio, J. Shavlik and G. Phillips (2006). A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 22; also presented at the Fourteenth International Conference on Intelligent Systems for Molecular Biology (ISMB), Fortaleza, Brazil. (pdf)

  • F. DiMaio, J. Shavlik and G. Phillips (2005). Pictorial structures for molecular modeling: Interpreting density maps. Advances in Neural Information Processing Systems (NIPS) 17, Vancouver, Canada. (pdf)

  • F. DiMaio and J. Shavlik (2004). Learning an approximation to inductive logic programming clause evaluation. Proceedings of the Fourteenth International Conference on Inductive Logic Programming, Porto, Portugal. (pdf)