Research Interests

My main research interests are the area of statistical inference for molecular evolution and for trait evolution. This work involves graphical models using stochastic processes (discrete Markov processes or continuous diffusions), and developing tools for model selection, Bayesian inference, and new models that capture enough realism, but that remain computationally feasible in our Big Data era.

I also got interested in a number of various biological applications through statistical consulting across campus, like food science and veterinary science.

Molecular Evolution

One of my aim is to detect what groups of genes share the same genealogy, to draw inference on the distribution of genealogies across the genome, and then reconstruct phylogenetic networks when the relationships are best depicted by a network. This area involves statistical issues of model selection, hierarchical modelling of species genealogies and gene genealogies, and it also involves computational challenges. Indeed, molecular data become available faster than appropriate methods of analysis. Development of these methods is currently funded by the NSF, such as the IFDS. See also these earlier awards, to study phylogenetic networks, reticulate evolution and species delimitation in baobabs, on the tree of Enterobacteria, discordance patterns, and monocots.

Trait Evolution

More recently, I have been interested in using phylogenetic trees to analyze trait evolution, using the phylogenetic 'comparative methods'. Data collected on species (or related individuals) do not form a random sample because they lack independence: sister species are expected to have similar traits. Such samples can show a high level of dependence, and there need to be adapted statistical methods of analysis. I am interested in the statistical properties of estimation methods, in the effective degree of freedom for parameters in these models, and adapted model selection procedure, to discover abrupt shifts in trait evolution for example. I am also extending these phylogenetic comparative methods to accommodate reticulation evolution, when the phylogenetic relationships are best depicted by a network. See this earlier NSF project.

Software development

Methods are good to nothing if they are not implemented and user-friendly! I got involved in software development more and more (see here) first using C/C++ and R, now using Julia because it combines speed (like C) with interactivity (like R).

Interested to join the group?

I typically advise graduate students from the Statistics PhD program, but I also (co-)advise graduate students from the Botany program. Don't hesitate to contact me if you have questions.

I also welcome undergraduate students in the group, typically for senior honor theses. Students will need a solid background in statistics, as well as programming skills (CS 300 and R courses Stat 303-305) and background in Genetics.

Please see expectations and guidelines for students in my research group.

Students

Jingcheng Xu (Stat PhD), Benjamin Teo (Stat PhD), Lauren Frankel (Botany PhD)
Michael Maxfield (BS CS major, 2024), Maxwell Sherwin (BS CS major, 2025),
John Fogg (Stat PhD, 2023), Cathy Cao (BS CS major, 2023)
Cora Allen-Savietta (Stat PhD, 2020) github logo - post-graduation: Statistical Scientist at Berry Consultants, LLC.
Ruoyi Cai (BS Stat major, 2019)
Sabrina Yu (BS Stat major, 2017), Nan Ji (BS Stat major, 2017), Christian Borst (BS CS major, 2017)
Mohammad Khabbazian (Karl Rohe primary advisor) (ECE PhD 2016) - post-graduation: postdoc at Columbia University
Mengyao Yang (BS Stat major, 2016)
Claudia Solís-Lemus (Stat PhD, 2015) - now faculty at WID, UW-Madison
Lam Ho (Stat PhD, 2014) - now faculty at Dalhousie University
Yicheng Li (BS Stat major, 2014)
Charles-Elie Rabier (postdoc, 2011-2013)
Yujin Chung (Stat PhD, 2012) - now faculty at Kyonggi University, South Korea
Satish Kumar (Computer Science MS, 2010)