# Dr Vinny Davies

**Lecturer**(Statistics)

## Biography

Dr Davies is a lecturer in Statistics in the School of Mathematics and Statistics specilising in computational biology and computational methods for statistics and machine learning. He completed his Ph.D. within the School of Mathematics and Statistics where he focused on variable selection models for selecting antigenic sites in virus evolution. He then completed several post-docteral research positions in both the schools of Statistics and Computing Science, as well as spending time as a Biostatistician at the University of Leeds. He recently returned to the School of Mathematics and Statistics where his research interests will focus on methods on the interface between Statistics and Machine Learning. He has a particular interest in Computational Metabolomics, but a general interest in applying statistical and machine learning methods to any biological, chemical or health problem.

If you are interested in a doing a Ph.D., please take a look at the Additional Information section or email me directly.

## Research interests

### Research units

- Statistics & Data Analytics
- Bayesian modelling and inference
- Computational statistics
- Machine learning and AI
- Environmental, ecological sciences and sustainability
- Statistical modelling for biology, genetics and *omics

## Supervision

**Current MRes Students**

- Cara MacBride

**Current PhD Students**

- Ross McBride

**Davison**, Emily

Designing advanced statistical inference methods for learning the parameters of a mathematical biodiversity model

## Teaching

I generally teach on the introductory Python courses as well as the Large Scale Computing course (NNs in Tensorflow), as well as supervise a number of undergraduate and master's projects.

## Research datasets

## Additional information

I am looking for potential PhD students across a range of subjects and have a number of projects available below. Please contact me if you wish to discuss these or any other projects further.

**Metabolomics DIA Resolver**

In metabolomics we take a sample (blood, urine, etc) and put it through a mass spectrometer. The mass spectrometer scans the sample in multiple ways to help us work out what metabolites can be found in the sample. Identifying these metabolites can be useful for clinical trials, disease diagnosis and progression and various other medical applications. There are various way of choosing the scans, but in one particular method (DIA) we often see multiple fragments from multiple metabolites in a single scan. In order to identify the metabolites we need to work out which fragments belong to which metabolites. The project will use our recently developed virtual mass spectrometer, ViMMS (Wandy et al., 2019; Wandy et al., 2022), to continue the development of our new metabolomics DIA resolver, MSdeconvolve. We will expand MSdeconvole to work across multiple repeated samples collected in different ways and then extended it to work for completely different samples. Initially this will be done using standard statistical and machine learning methods, but we will look to extend this into a Bayesian modelling framework.

**Scalable Bayesian Models for Inferring Evolutionary Traits of Plants**

Supervised jointly with Richard Reeve and Claire Harris

The functional traits and environmental preferences of plant species determine how they will react to changes resulting from global warming. The main global biodiversity repositories, such as the Global Biodiversity Information Facility (GBIF), contain hundreds of millions of records from hundreds of thousands of species in the plant kingdom alone, and the spatiotemporal data in these records can be associated with soil, climate or other environmental data from other databases. Combining these records allow us to identify environmental preferences, especially for common species where many records exist. Furthermore, in a previous PhD studentship we showed that these traits are highly evolutionarily conserved (Harris et al., 2022), so it is possible to impute the preferences for rare species where little data exists using phylogenetic inference techniques.

The aim of this PhD project is to investigate the application of Bayesian variable selection methods to identify these evolutionarily conserved traits more effectively, and to quantify these traits and their associated uncertainty for all plant species for use in a plant ecosystem digital twin that we are developing separately to forecast the impact of climate change on biodiversity. In another PhD studentship, we previously developed similar methods for trait inference in viral evolution (Davies et al., 2017; Davies et al., 2019), but due to the scale of the data here, these methods will need to be significantly enhanced. We therefore propose a project to investigate extensions to methods for phylogenetic trait inference to handle datasets involving hundreds of millions of records in phylogenies with hundreds of thousands of tips, potentially through either sub-sampling (Quiroz et al., 2018) or modelling splitting and recombination (Nemeth & Sherlock, 2018).

**Gaussian Process Emulation for Mathematical Models of the Heart**

Supervised jointly with Benn Macdonald, Mu Nui, and Hao Gao

Mathematical models of the heart can help us understand how the heart functions and provide us with valuable insights into how we can treat patients or diagnose disease. Previous and ongoing work has looked at how we can use statistical and machine learning emulation strategies to speed up inference and make the mathematical models applicable within a clinical setting. The aim of this project is to further develop these methods through the application of Gaussian Processes and apply them to different mathematical problems with higher dimensional inputs. In many of the possible applications, the mathematical models will often have high dimensional and potentially correlated parameter inputs, as well as highly correlated outputs. The initial aim of this work will be to further develop the emulation methods to deal with these problems and look at how we can more effectively select the parameter inputs for the simulations we choose to generate output for. Further work will then look at how these models can potentially be combined with other techniques such as automated annotation, accelerating the construction of our emulator, or through the combination of other emulators, which would allow for the modelling of a more global system.