Statistics and Data Analytics

Staff

Dr Andrej Aderhold : Research Associate

Supervisor: Dirk Husmeier

  • Publications
  • Dr Craig Alexander : Lecturer

     

    Research student: Peter Radvanyi

  • Personal Website
  • Dr Linda Altieri : Environmental Research Associate

    Dr Craig Anderson : Lecturer

    Research students: Alison Smith, Xueqing Yin, Riham Ismail, Kamol Sanittham, Michael Waltenberger

  • Personal Website
  • Dr Jafet Belmont Osuna : Research Associate

    Environmental statistics; species distributions modelling; spatial ecology; analysis of citizen science data; application of Bayesian methods to characterize biological communities in changing environments

    Supervisors: Marian Scott OBE, Claire Miller (née Ferguson)

  • Dr Mitchum Bock : Lecturer

  • Publications
  • Dr Agnieszka Borowska : Research Assistant

    Supervisor: Dirk Husmeier

  • Prof Adrian Bowman : Professor of Statistics

    Research students: Yinuo Liu, George Vazanellis

  • Personal Website
  • Publications
  • Dr Daniela Castro-Camilo : Lecturer

    Research students: Erin Bryce, Daniela Cuba, Chenglei Hu

  • Personal Website
  • Dr Christina A Cobbold : Reader

    Population dynamics of ecological systems; spatial ecology; evolutionary ecology in changing environments

    Member of other research groups: Mathematical Biology
    Research student: Renato Andrade

  • Personal Website
  • Publications
  • Dr Nema Dean : Lecturer

    Supervised and unsupervised learning; mixture models; variable selection; educational testing data; dynamic treatment regime estimation

    Research students: Shuhrah Alghamdi, Riham Ismail, Sebastian Martinez Bustos, Robin Muegge(PGR), Aldawarsi Bashayr, Alastair Gemmell

  • Personal Website
  • Publications
  • Dr Amira Elayouty : Lecturer

  • Ludger Evers : Lecturer (part-time)

    Research students: Benjamin Szili, Ivona Voroneckaja, Shuhrah Alghamdi, Dimitra Eleftheriou

  • Publications
  • Prof James Campbell Gemmell : Honorary Professor

    Prof Gemmell is chief executive of the Environment Protection Agency of South Australia.

  • Personal Website
  • Dr Mayetri Gupta : Reader

    Research students: Flynn Gewirtz-O'Reilly, Lanxin Li, Kannat Na Bangchang
    Postgraduate opportunities: Bayesian statistical data integration of single-cell and bulk “OMICS” datasets with clinical parameters for accurate prediction of treatment outcomes in Rheumatoid Arthritis, Bayesian variable selection for genetic and genomic studies

  • Personal Website
  • Publications
  • Prof Dirk Husmeier : Chair of Statistics

    Machine learning and Bayesian statistics applied to systems biology and bioinformatics; Bayesian networks; statistical phylogenetics

    Research staff: Andrej Aderhold, Agnieszka Borowska, Alan Lazarus, Benn Macdonald, Mihaela Paun
    Research students: Shaykah Aldossari, Aldawarsi Bashayr, Dalton David, Campioni Nazareno, Ionut Paun, Yalei Yang

  • Personal Website
  • Publications
  • Prof Janine Illian : Chair/Professor in Statistical Science

    My work focuses on spatial point process methodology with a focus on the development of modern, realistically complex, spatial statistical methodology that is both computationally feasible and relevant to end-users. During my career I have been enthudiastic about taking spatial point processes from the theoretical literature into the real world and is encouraging statistical development by fostering strong relationships with the user community.   My work has has impacted on spatial modelling and biodiversity research in the context of ecological studies across many species, taxa and ecosystems. I also have a keen interest of applying realistically complex spatial models in other context, including crime modelling, earthquake forecasting, environmental modelling, epidemiology and terrorism studies.

    Research staff: Andrew Seaton
    Research students: Erin Bryce, Stephen Jun Villejo
    Postgraduate opportunities: Integrated spatio-temporal modelling for environmental data, New methods for analysis of migratory navigation

  • Dr Eilidh Jack : Lecturer

    Research student: Robin Muegge(PGR)

  • Prof Duncan Lee : Professor

    Spatiotemporal modelling; Bayesian methods; environmental epidemiology and disease mapping

    Research students: George Gerogiannis, Kamol Sanittham, Michael Waltenberger, Robin Muegge(PGR), Yoana Napier, Xueqing Yin
    Postgraduate opportunities: Mapping disease risk in space and time, Estimating the effects of air pollution on human health, Forecasting Local Net-electricity Demand at Scale

  • Personal Website
  • Publications
  • Dr Marnie Low : Lecturer

    Research student: Peter Radvanyi

  • Personal Website
  • Publications
  • Dr Vincent Macaulay : Reader

    Statistical genetics; population genetics; Bayesian methods; phylogenetics; GPs

    Research student: Laura Stewart

  • Personal Website
  • Publications
  • Dr Benn Macdonald : Research Assistant

    Member of other research groups: Mathematical Biology
    Research student: Hanadi Alzahrani
    Supervisor: Dirk Husmeier

  • Dr Colette Mair : Lecturer

  • Prof Claire Miller (née Ferguson): Professor

    Environmental and ecological modelling; nonparametric smoothing; time series analysis; functional data analysis

    Research staff: Craig Wilkie, Jafet Belmont Osuna
    Research students: Peter Radvanyi, Michael Currie
    Postgraduate opportunities: Funded PhD project: Data analytics for urban environmental planning

  • Personal Website
  • Publications
  • Dr Gary Napier : Lecturer

    Research students: Catherine Holland, Michael Waltenberger

  • Publications
  • Dr Tereza Neocleous : Lecturer

    Forensic statistics; quantile regression; semiparametric models; biostatistics applications

    Research students: Dimitra Eleftheriou, Catherine Holland

  • Personal Website
  • Publications
  • Dr Mu Niu : Lecturer

    Research student: Wenhui Zhang

  • Dr Agostino Nobile : Honorary Research Fellow

    Bayesian statistics; MCMC and other Monte Carlo methods; mixture models; discrete choice models

  • Personal Website
  • Publications
  • Dr Ruth O'Donnell : Lecturer

  • Publications
  • Dr Theo Papamarkou : Lecturer

    Research students: Benjamin Szili, Dimitra Eleftheriou

  • Dr Mihaela Paun : Research Associate

    Supervisor: Dirk Husmeier

  • Dr Surajit Ray : Senior lecturer

    COVID Resarch, Functional Data Analysis; Analysis of mixture models; high-dimensional data; medical image analysis; analysis of earth systems data; immunoinformatics

    Research students: Salihah Alghamdi, Yangsong Cheng, Alastair Gemmell, Bader Lafi Q Alruwaili, Wenhui Zhang, Flynn Gewirtz-O'Reilly
    Postgraduate opportunities: Modality of mixtures of distributions, Analysis of Spatially correlated functional data objects.

  • Personal Website
  • Publications
  • Prof Marian Scott OBE: Professor of Environmental Statistics

    Radio-carbon and cosmogenic dating-design and analysis of proficiency trials; environmental radioactivity; sensitivity and uncertainty analysis applied to complex environmental models; spatial and spatiotemporal modeling of water quality; flood risk modeling; environmental indicators; developing the evidence base for environmental policy and regulation

    Research staff: Jafet Belmont Osuna
    Research students: Michael Currie, Yoana Napier, Daniela Cuba

  • Personal Website
  • Publications
  • Dr Andrew Seaton : Research Associate

    Supervisor: Janine Illian

  • Qingying Shu : Postdoctoral Research Fellow

    Supervisor: Xiaoyu Luo

  • Dr Ron Smith : Honorary Senior Research Fellow

  • Personal Website
  • Publications
  • Dr Ben Swallow : Lecturer

    Bayesian statistical inference; Markov chain Monte Carlo (MCMC) methods; data integration; model selection; stochastic processes

    Member of other research groups: Mathematical Biology
    Research students: Stephen Jun Villejo, Chenglei Hu

  • Personal Website
  • Prof Michael Titterington : Honorary Senior Research Fellow

    Statistical analysis of mixture distributions; latent structure analysis; pattern recognition; machine learning; smoothing and nonparametric statistics; optimum design of experiments

  • Personal Website
  • Publications
  • Dr Bernard Torsney : Honorary Research Fellow

    Non-parametric inference; optimisation; optimal experimental design; sampling theory; applications in economics; multiple comparisons

  • Personal Website
  • Publications
  • Dr Liberty Vittert : Mitchell Lecturer

  • Personal Website
  • Dr Vlad Vyshemirsky : Lecturer

    Research student: Lida Mavrogonatou

  • Publications
  • Dr Craig Wilkie : Research Associate

    Supervisor: Claire Miller (née Ferguson)

  • Dr Xiaochen Yang : Lecturer

    Supervised learning; distance metric learning; hyperspectral image analysis

  • Personal Website
  • Dr Wei Zhang : Lecturer

    Bayesian data analysis, Ecological statistics, Statistical computing 

    Member of other research groups: Continuum Mechanics - Modelling and Analysis of Material Systems


  • Postgraduates

    Salihah Alghamdi : PhD Student

    Research Topic: Analysis of Spatially correlated functional data objects.
    Supervisor: Surajit Ray

  • Erin Bryce : PhD Student

    Research Topic: Statistical landslide hazard modelling with a view towards medium to long term territorial planning
    Supervisors: Daniela Castro-Camilo, Janine Illian

  • Yangsong Cheng : PhD Student

    Research Topic: Computing, Inference and Applications of Hierarchical Mode Association Clustering
    Supervisor: Surajit Ray

  • Daniela Cuba : PhD Student

    Research Topic: Statistical tools to interpret soil variation
    Supervisors: Daniela Castro-Camilo, Marian Scott OBE

  • Michael Currie : PhD Student

    Supervisors: Marian Scott OBE, Claire Miller (née Ferguson)

  • Dimitra Eleftheriou : PhD Student

    Supervisors: Tereza Neocleous, Ludger Evers, Theo Papamarkou

  • Flynn Gewirtz-O'Reilly : PhD Student

    Supervisors: Mayetri Gupta, Surajit Ray

  • Catherine Holland : PhD Student

    Research Topic: Bayesian approaches to compositional data with structural zeros
    Supervisors: Gary Napier, Tereza Neocleous

  • Chenglei Hu : PhD Student

    Research Topic: Natural hazard risk estimation using Multivariate Extreme-Value Mixture Models (MEVMM)
    Supervisors: Daniela Castro-Camilo, Ben Swallow

  • Bader Lafi Q Alruwaili : PhD Student

    Research Topic: Clustering and Cluster Inference of complex data structures
    Supervisor: Surajit Ray

  • Yinuo Liu : PhD Student

    Supervisor: Adrian Bowman

  • Lida Mavrogonatou : PhD Student

    Supervisor: Vlad Vyshemirsky

  • Robin Muegge(PGR) : PhD Student

    Research Topic: Estimating the effects of air pollution on human health
    Supervisors: Nema Dean, Duncan Lee, Eilidh Jack

  • Kannat Na Bangchang : PhD Student

    Supervisors: Mayetri Gupta, Manuele Leonelli

  • Yoana Napier : MSc Student

    Supervisors: Marian Scott OBE, Duncan Lee

  • Peter Radvanyi : PhD Student

    Research Topic: Groundwater monitoring design
    Supervisors: Claire Miller (née Ferguson), Craig Alexander, Marnie Low

  • Kamol Sanittham : PhD Student

    Supervisors: Duncan Lee, Craig Anderson

  • Alison Smith : PhD Student

    Research Topic: Developing novel ways to represent spatial patterns in disease risk
    Supervisor: Craig Anderson

  • Laura Stewart : PhD Student

    Research Topic: Development and application of stochastic models of agglomeration
    Supervisors: Vincent Macaulay, Alexey Lindo

  • Benjamin Szili : PhD Student

    Supervisors: Ludger Evers, Theo Papamarkou

  • George Vazanellis : PhD Student

    Research Topic: Spatiotemporal models for environmental data
    Supervisor: Adrian Bowman

  • Stephen Jun Villejo : PhD Student

    Research Topic: A Bayesian Spatio-Temporal Model to Test for Stability of Risks for Spatially Misaligned Data
    Supervisors: Ben Swallow, Janine Illian

  • Ivona Voroneckaja : PhD Student

    Supervisor: Ludger Evers

  • Michael Waltenberger : PhD Student

    Supervisors: Duncan Lee, Craig Anderson, Gary Napier

  • Yalei Yang : PhD Student

    Supervisors: Hao Gao, Dirk Husmeier

  • Xueqing Yin : PhD Student

    Research Topic: Mapping disease risk in space and time
    Supervisors: Craig Anderson, Duncan Lee

  • Wenhui Zhang : PhD Student

    Research Topic: Analysis of Positron Emission Tomography data for tumour detection and delineation
    Supervisors: Surajit Ray, Mu Niu


  • Postgraduate opportunities

    Estimating the effects of air pollution on human health (PhD)

    Supervisors: Duncan Lee
    Relevant research groups: Statistics and Data Analytics

    The health impact of exposure to air pollution is thought to reduce average life expectancy by six months, with an estimated equivalent health cost of 19 billion each year (from DEFRA). These effects have been estimated using statistical models, which quantify the impact on human health of exposure in both the short and the long term. However, the estimation of such effects is challenging, because individual level measures of health and pollution exposure are not available. Therefore, the majority of studies are conducted at the population level, and the resulting inference can only be made about the effects of pollution on overall population health. However, the data used in such studies are spatially misaligned, as the health data relate to extended areas such as cities or electoral wards, while the pollution concentrations are measured at individual locations. Furthermore, pollution monitors are typically located where concentrations are thought to be highest, known as preferential sampling, which is likely to result in overly high measurements being recorded. This project aims to develop statistical methodology to address these problems, and thus provide a less biased estimate of the effects of pollution on health than are currently produced.

     

    Bayesian variable selection for genetic and genomic studies (PhD)

    Supervisors: Mayetri Gupta
    Relevant research groups: Statistics and Data Analytics

    An important issue in high-dimensional regression problems is the accurate and efficient estimation of models when, compared to the number of data points, a substantially larger number of potential predictors are present. Further complications arise with correlated predictors, leading to the breakdown of standard statistical models for inference; and the uncertain definition of the outcome variable, which is often a varying composition of several different observable traits. Examples of such problems arise in many scenarios in genomics- in determining expression patterns of genes that may be responsible for a type of cancer; and in determining which genetic mutations lead to higher risks for occurrence of a disease. This project involves developing broad and improved Bayesian methodologies for efficient inference in high-dimensional regression-type problems with complex multivariate outcomes, with a focus on genetic data applications.

    The successful candidate should have a strong background in methodological and applied Statistics, expert skills in relevant statistical software or programming languages (such as R, C/C++/Python), and also have a deep interest in developing knowledge in cross-disciplinary topics in genomics. The candidate will be expected to consolidate and master an extensive range of topics in modern Statistical theory and applications during their PhD, including advanced Bayesian modelling and computation, latent variable models, machine learning, and methods for Big Data. The successful candidate will be considered for funding to cover domestic tuition fees, as well as paying a stipend at the Research Council rate for four years.

     

    Analysis of Spatially correlated functional data objects. (PhD)

    Supervisors: Surajit Ray
    Relevant research groups: Statistics and Data Analytics

    Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which the student will work on:

    • robust statistical models for incorporating temporal and spatial dependencies in functional data
    • developing reliable prediction and interpolation techniques for dependent functional data
    • developing inferential framework for testing hypotheses related to simplified dependent structures
    • analysing sparsely observed functional data by borrowing information from neighbours
    • visualisation of data summaries associated with dependent functional data
    • Clustering of functional data

     

    Mapping disease risk in space and time (PhD)

    Supervisors: Duncan Lee
    Relevant research groups: Statistics and Data Analytics

    Disease risk varies over space and time, due to similar variation in environmental exposures such as air pollution and risk inducing behaviours such as smoking.  Modelling the spatio-temporal pattern in disease risk is known as disease mapping, and the aims are to: quantify the spatial pattern in disease risk to determine the extent of health inequalities,  determine whether there has been any increase or reduction in the risk over time, identify the locations of clusters of areas at elevated risk, and quantify the impact of exposures, such as air pollution, on disease risk. I am working on all these related problems at present, and I have PhD projects in all these areas.

     

    Modality of mixtures of distributions (PhD)

    Supervisors: Surajit Ray
    Relevant research groups: Statistics and Data Analytics

    Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modeling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. In this project, we wish to explore the modality of non-normal distributions and their application to real life problems

     

     

    Bayesian statistical data integration of single-cell and bulk “OMICS” datasets with clinical parameters for accurate prediction of treatment outcomes in Rheumatoid Arthritis (PhD)

    Supervisors: Mayetri Gupta
    Relevant research groups: Statistics and Data Analytics

    In recent years, many different computational methods to analyse biological data have been established: including DNA (Genomics), RNA (Transcriptomics), Proteins (proteomics) and Metabolomics, that captures more dynamic events. These methods were refined by the advent of single cell technology, where it is now possible to capture the transcriptomics profile of single cells, spatial arrangements of cells from flow methods or imaging methods like functional magnetic resonance imaging. At the same time, these OMICS data can be complemented with clinical data – measurement of patients, like age, smoking status, phenotype of disease or drug treatment. It is an interesting and important open statistical question how to combine data from different “modalities” (like transcriptome with clinical data or imaging data) in a statistically valid way, to compare different datasets and make justifiable statistical inferences. This PhD project will be jointly supervised with Dr. Thomas Otto and Prof. Stefan Siebert from the Institute of Infection, Immunity & Inflammation), you will explore how to combine different datasets using Bayesian latent variable modelling, focusing on clinical datasets from Rheumatoid Arthritis.

    Funding Notes

    The successful candidate will be considered for funding to cover domestic tuition fees, as well as paying a stipend at the Research Council rate for four years.

     

    Funded PhD project: Data analytics for urban environmental planning (PhD)

    Supervisors: Claire Miller (née Ferguson)
    Relevant research groups: Statistics and Data Analytics

    The transition to a sustainable society is one of the key challenges facing researchers, policy makers and communities today. Key to future city planning for sustainable solutions is an understanding of what data are available and required to inform effective decision making. Novel data analytics and data visualisations are essential tools in this process. 

    This PhD is suitable for someone from a mathematical/computational sciences background with a strong interest in data analytics and data visualisation. This studentship is an opportunity to develop expertise in data-driven analytics/modelling for connecting quantitative and qualitative spatial (and temporal) data streams and investigating questions arising in urban environmental planning. The successful candidate will play a key role within a large, multi-disciplinary project, GALLANT, supporting Glasgow’s sustainable transformation.

    You can find futher information here:

    Data analytics for urban environmental planning at University of Glasgow on FindAPhD.com

     

    New methods for analysis of migratory navigation (PhD)

    Supervisors: Janine Illian
    Relevant research groups: Statistics and Data Analytics

    Joint project with Dr Urška Demšar (University of St Andrews)

    Migratory birds travel annually across vast expanses of oceans and continents to reach their destination with incredible accuracy. How they are able to do this using only locally available cues is still not fully understood. Migratory navigation consists of two processes: birds either identify the direction in which to fly (compass orientation) or the location where they are at a specific moment in time (geographic positioning). One of the possible ways they do this is to use information from the Earth’s magnetic field in the so-called geomagnetic navigation (Mouritsen 2018). While there is substantial evidence (both physiological and behavioural) that they do sense magnetic field (Deutschlander and Beason 2014), we however still do not know exactly which of the components of the field they use for orientation or positioning. We also do not understand how rapid changes in the field affect movement behaviour.

    There is a possibility that birds can sense these rapid large changes and that this may affect their navigational process. To study this, we need to link accurate data on Earth’s magnetic field with animal tracking data. This has only become possible very recently through new spatial data science advances:  we developed the MagGeo tool, which links contemporaneous geomagnetic data from Swarm satellites of the European Space Agency with animal tracking data (Benitez Paez et al. 2021).

    Linking geomagnetic data to animal tracking data however creates a highly-dimensional data set, which is difficult to explore. Typical analyses of contextual environmental information in ecology include representing contextual variables as co-variates in relatively simple statistical models (Brum Bastos et al. 2021), but this is not sufficient for studying detailed navigational behaviour. This project will analyse complex spatio-temporal data using computationally efficient statistical model fitting approches in a Bayesian context.

    This project is fully based on open data to support reproducibility and open science. We will test our new methods by annotating publicly available bird tracking data (e.g. from repositories such as Movebank.org), using the open MagGeo tool and implementing our new methods as Free and Open Source Software (R/Python).

    References

    Benitez Paez F, Brum Bastos VdS, Beggan CD, Long JA and Demšar U, 2021. Fusion of wildlife tracking and satellite geomagnetic data for the study of animal migration. Movement Ecology, 9:31. https://doi.org/10.1186/s40462-021-00268-4

    Brum Bastos VdS, Łos M, Long JA, Nelson T and Demšar U, 2021, Context-aware movement analysis in ecology: a systematic review. International Journal of Geographic Information Science, https://doi.org/10.1080/13658816.2021.1962528

    Deutschlander ME and Beason RC, 2014. Avian navigation and geographic positioning. Journal of Field Ornithology, 85(2):111–133. https://doi.org/10.1111/jofo.12055

     

    Scalable Bayesian Models for Inferring Evolutionary Traits of Plants (PhD)

    Supervisors: Vinny Davies
    Relevant research groups: Statistics and Data Analytics

    The functional traits and environmental preferences of plant species determine how they will react to changes resulting from global warming. The main global biodiversity repositories, such as the Global Biodiversity Information Facility (GBIF), contain hundreds of millions of records from hundreds of thousands of species in the plant kingdom alone, and the spatiotemporal data in these records can be associated with soil, climate or other environmental data from other databases. Combining these records allow us to identify environmental preferences, especially for common species where many records exist. Furthermore, in a previous PhD studentship we showed that these traits are highly evolutionarily conserved (Harris et al., 2022), so it is possible to impute the preferences for rare species where little data exists using phylogenetic inference techniques.

    The aim of this PhD project is to investigate the application of Bayesian variable selection methods to identify these evolutionarily conserved traits more effectively, and to quantify these traits and their associated uncertainty for all plant species for use in a plant ecosystem digital twin that we are developing separately to forecast the impact of climate change on biodiversity. In another PhD studentship, we previously developed similar methods for trait inference in viral evolution (Davies et al., 2017; Davies et al., 2019), but due to the scale of the data here, these methods will need to be significantly enhanced. We therefore propose a project to investigate extensions to methods for phylogenetic trait inference to handle datasets involving hundreds of millions of records in phylogenies with hundreds of thousands of tips, potentially through either sub-sampling (Quiroz et al, 2018) or modelling splitting and recombination (Nemeth & Sherlock, 2018).

     

    Forecasting Local Net-electricity Demand at Scale (PhD)

    Supervisors: Jethro Browell, Duncan Lee
    Relevant research groups: Statistics and Data Analytics

    Electricity supply and demand must balance in real-time, which is increasingly challenging as low-carbon technologies revolutionise energy production (wind, solar) and consumption (electric vehicles, heat pumps). Short-term forecasts are therefore essential to maintain an economic and reliable supply of electricity. Such forecasts are widely used in the energy sector, but forecasters face emerging challenges from new consumer behaviours, small scale generation and storage, as well as data quality, privacy, and security issues. This PhD project will give you the opportunity to develop statistical models to forecast electricity demand at regional and local levels of our continuously evolving energy system. Research themes include: 

    • Computationally efficient modelling and forecasting of 100s or 1000s of regions (or potentially millions of smart meters!). 
    • Adaptive modelling and forecasting in the presence of structural breaks.
    • Probabilistic forecasting accounting for spatial and temporal dependencies and hierarchies.

    The project provides an excellent opportunity to conduct cutting edge methodological development complemented by a practical application of societal importance. The successful candidate will need to be comfortable with interfacing with other disciplines and industry partners and be passionate about their research. 

     

    Metabolomics DIA Resolver (PhD)

    Supervisors: Vinny Davies
    Relevant research groups: Statistics and Data Analytics

    In metabolomics we take a sample (blood, urine, etc) and put it through a mass spectrometer. The mass spectrometer scans the sample in multiple ways to help us work out what metabolites can be found in the sample. Identifying these metabolites can be useful for clinical trials, disease diagnosis and progression and various other medical applications. There are various way of choosing the scans, but in one particular method (DIA) we often see multiple fragments from multiple metabolites in a single scan. In order to identify the metabolites we need to work out which fragments belong to which metabolites. The project will use our recently developed virtual mass spectrometer, ViMMS (Wandy et al., 2019Wandy et al., 2022), to continue the development of our new metabolomics DIA resolver, MSdeconvolve. We will expand MSdeconvole to work across multiple repeated samples collected in different ways and then extended it to work for completely different samples. Initially this will be done using standard statistical and machine learning methods, but we will look to extend this into a Bayesian modelling framework.

     

    Integrated spatio-temporal modelling for environmental data (PhD)

    Supervisors: Janine Illian
    Relevant research groups: Statistics and Data Analytics

    (Jointly supervised by Peter Henrys, CEH)

    The last decade has seen a proliferation of environmental data with vast quantities of information available from various sources. This has been due to a number of different factors including: the advent of sensor technologies; the provision of remotely sensed data from both drones and satellites; and the explosion in citizen science initiatives. These data represent a step change in the resolution of available data across space and time - sensors can be streaming data at a resolution of seconds whereas citizen science observations can be in the hundreds of thousands.  

    Over the same period, the resources available for traditional field surveys have decreased dramatically whilst logistical issues (such as access to sites, ) have increased. This has severely impacted the ability for field survey campaigns to collect data at high spatial and temporal resolutions. It is exactly this sort of information that is required to fit models that can quantify and predict the spread of invasive species, for example. 

    Whilst we have seen an explosion of data across various sources, there is no single source that provides both the spatial and temporal intensity that may be required when fitting complex spatio-temporal models (cf invasive species example) - each has its own advantages and benefits in terms of information content. There is therefore potentially huge benefit in beginning together data from these different sources within a consistent framework to exploit the benefits each offers and to understand processes at unprecedented resolutions/scales that would be impossible to monitor. 

    Current approaches to combining data in this way are typically very bespoke and involve complex model structures that are not reusable outside of the particular application area. What is needed is an overarching generic methodological framework and associated software solutions to implement such analyses. Not only would such a framework provide the methodological basis to enable researchers to benefit from this big data revolution, but also the capability to change such analyses from being stand alone research projects in their own right, to more operational, standard analytical routines. 

    FInally, such dynamic, integrated analyses could feedback into data collection initiatives to ensure optimal allocation of effort for traditional surveys or optimal power management for sensor networks. The major step change being that this optimal allocation of effort is conditional on other data that is available. So, for example, given the coverage and intensity of the citizen science data, where should we optimally send our paid surveyors? The idea is that information is collected at times and locations that provide the greatest benefit in understanding the underpinning stochastic processes. These two major issues - integrated analyses and adaptive sampling - ensure that environmental monitoring is fit for purpose and scientists, policy and industry can benefit from the big data revolution. 

    This project will develop an integrated statistical modelling strategy that provides a single modelling framework for enabling quantification of ecosystem goods and services while accounting for the fundamental differences in different data streams. Data collected at different spatial resolutions can be used within the same model through projecting it into continuous space and projecting it back into the landscape level of interest.  As a result, decisions can be made at the relevant spatial scale and uncertainty is propagated through, facilitating appropriate decision making.