Modelling Genetic Variations using Fragmentation-Coagulation Processes
Yee Whye Teh (University of Oxford)
Friday 27th September, 2013 15:00-16:00 Maths 204
Hudson's coalescent with recombination is a well-known model of genetic variation in populations with recombination. With growing amounts of population genetics data, demand for probabilistic models to analyse such data is strong, and the coalescent with recombination is a very natural candidate. Unfortunately posterior inference in the model is intractable, and a number of approximations and alternatives have been proposed.
A popular class of alternatives are based on hidden Markov models (HMMs), which can be understood as approximating the tree-structured genealogies at each point of the chromosome with a partition of the observed haplotypes. However HMMs suffer from two problems. Firstly, they are parametric and requires either a user specified number of states or expensive model selection procedures. Secondly, due to the way HMMs parametrize partitions using latent states, they suffer from significant label switching issues affecting the quality of posterior inferences.
We propose a novel Bayesian nonparametric model for genetic variations based on Markov processes over partitions called fragmentation-coagulation processes. Our model is based on a particularly simple class of Berestycki's exchangeable fragmentation-coalescence processes with nice properties: they are reversible and stationary Markov process which evolves via binary fragmentations and coagulations. Statistically, our model can infer the number of states easily and automatically, and does not suffer from the label-switching issues of HMMs. Inference is achieved using an efficient Gibbs sampling algorithm, and we report encouraging results on genotype imputation.
Joint work with Lloyd Elliott and Charles Blundell.