Principal component analysis in the space of phylogenetic trees

Tom Nye (Newcastle University)

Friday 17th March, 2017 15:00-16:00
Maths 203

Abstract

Phylogenetic analysis of DNA or other data commonly gives rise to a sample of inferred evolutionary trees. Principal Component Analysis (PCA) cannot be applied directly to samples of trees since the space of evolutionary trees on a fixed set of taxa is not a Euclidean vector space. Instead, principal component analysis must be reformulated in the geometry of tree-space, which is a metric space with a unique geodesic between each pair of trees. The analogue of a Euclidean first principal component is a principal geodesic in tree-space, and these can be estimated by minimizing sums of squared projected distances to the data. However, the construction of higher-order principal components remained elusive for several years. In this talk I propose a solution: the k-th order principal component is the locus of the weighted Frechet mean of k+1 points in tree-space, where the weights vary over the standard k-dimensional simplex. I will describe basic properties of these objects, in particular that locally they generically have dimension k, and propose an efficient algorithm for projection onto these surfaces. Combined with a stochastic optimization algorithm, this gives a procedure for constructing a principal component of arbitrary order in tree-space. These methods enable visualizations of slices of tree-space, revealing structure within these complex data sets.

Add to your calendar

Download event information as iCalendar file (only this event)

iCalendar URL:
TBC
How to use our iCalendar feeds
Download current list of all events in the series (will not update)

We use cookies

Necessary cookies

Analytics cookies

Clarity

Principal component analysis in the space of phylogenetic trees

Tom Nye (Newcastle University)

Abstract

Add to your calendar