New Models for the Analysis of Compositional Data
Connie Stewart (University of New Brunswick)
Wednesday 4th July, 2018 15:00-16:00 Maths 116
Vectors of non-negative components carrying only relative information, and often normalized to sum to one, are referred to as compositional data and their sample space is the simplex. Compositional data arise in many applications across a variety of disciplines such as ecology, geology and economics to name a few. Aitchison’s log-ratio methods developed in the 1980s have, since this time, been a popular approach for analyzing compositional data analysis and have motivated much of the recent research in the area.
In this talk, two alternative parametric models for describing compositional data are presented, namely the a-folded multivariate normal distribution and the zero adjusted Dirichlet regression model. In the first case, a more flexible class of models for data defined on the simplex is introduced by means of a power transformation. The density is derived through a transformation of a multivariate normal random vector involving two parts, referred to as a folding transformation. Our research suggests that parameter estimation via the EM algorithm is efficient and consistent, and that the proposed model has the potential to provide an improved fit over the traditional log-ratio based models.
The second model is an extension of the Dirichlet distribution in the regression context to allow for zeros. Zeros in compositional data, while common in practice, are not compatiblewiththelog-ratiotheorynortheDirichletdistribution. Throughanadjustment of the Dirichlet log-likelihood we are able to directly incorporate the zeros. Although some properties of the Dirichlet distribution may make it unsuitable for certain data sets in practice, our model performed well in a variety of scenarios.
For both models, simulation study results and examples will be used to illustrate our findings.