Statistical Models and Algorithms for Combining Phylogenetic Analyses
Molecular phylogeny uses aligned molecular sequences to infer evolutionary relationships represented by a phylogenetic tree with end nodes corresponding to observed organisms. Typical phylogenetic analyses use publicly available software that fits a computationally intensive Bayesian model using Markov chain Monte Carlo (MCMC) simulation. Priors for model parameters in publicly available phylogenetic software are mostly non-informative.
Multiple phylogenetic analyses are typically fit independently. Combining these multiple analyses can be useful for researchers in many ways. For example, when interest lies in making inference about the phylogenetic model parameters for one or a few data sets, incorporating additional information about parameters via proper priors can be advantageous. An efficient and objective approach to constructing a proper prior is to use a hierarchical model that combines multiple analyses of data sets hierarchically. These data sets can be based on the data from publicly available genetic sequence databases or from data already collected in-house.
We have developed hierarchical semi-parametric regression models and a number of algorithms to efficiently combine multiple complex Bayesian phylogenetic analyses. This allows us to better estimate the parameters of interest within and across analyses, and to find informative priors for the parameters of interest which can be used to analyze specific data sets of interest. We demonstrate our approach using longitudinal HIV-1 sequence data and sequences from a benchmark alignment database.
Seminar Date:
April 4, 2007