Machine learning & AI
Our research projects fuse modern methods from machine learning and AI such as deep learning and Gaussian processes, with more traditional methods from statistics.
Postgraduate research students
Machine Learning and AI - Example Research Projects
Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.
Estimating false discovery rates in metabolite identification using generative AI (PhD)
Supervisors: Vinny Davies, Andrew Elliott, Justin J.J. van der Hooft (Wageningen University)
Relevant research groups: Machine Learning and AI, Emulation and Uncertainty Quantification, Statistical Modelling for Biology, Genetics and *omics, Statistics in Chemistry/Physics
Metabolomics is the study field that aims to map all molecules that are part of an organism, which can help us understand its metabolism and how it can be affected by disease, stress, age, or other factors. During metabolomics experiments, mass spectra of the metabolites are collected and then annotated by comparison against spectral databases such as METLIN (Smith et al., 2005) or GNPS (Wang et al., 2016). Generally, however, these spectral databases do not contain the mass spectra of a large proportion of metabolites, so the best matching spectrum from the database is not always the correct identification. Matches can be scored using cosine similarity, or more advanced methods such as Spec2Vec (Huber et al., 2021), but these scores do not provide any statement about the statistical accuracy of the match. Creating decoy spectral libraries, specifically a large database of fake spectra, is one potential way of estimating False Discovery Rates (FDRs), allowing us to quantify the probability of a spectrum match being correct (Scheubert et al., 2017). However, these methods are not widely used, suggesting there is significant scope to improve their performance and ease of use. In this project, we will use the code framework from our recently developed Virtual Metabolomics Mass Spectrometer (ViMMS) (Wandy et al., 2019, 2022) to systematically evaluate existing methods and identify possible improvements. We will then explore how we can use generative AI, e.g., Generative Adversarial Networks or Variational Autoencoders, to train a deep neural network that can create more realistic decoy spectra, and thus improve our estimation of FDRs.
Multi objective Bayesian optimisation for in silico to real metabolomics experiments (PhD/MSc)
Supervisors: Vinny Davies, Craig Alexander
Relevant research groups: Computational Statistics, Machine Learning and AI, Emulation and Uncertainty Quantification, Statistical Modelling for Biology, Genetics and *omics, Statistics in Chemistry/Physics
Untargeted metabolomics experiments aim to identify the small molecules that make up a particular sample (e.g., blood), allowing us to identify biomarkers, discover new chemicals, or understand the metabolism (Smith et al., 2014). Data Dependent Acquisition (DDA) methods are used to collect the information needed to identify the metabolites, and various more advanced DDA methods have recently been designed to improve this process (Davies et al. (2021); McBride et al. (2023)). Each of these methods, however, has parameters that must be chosen in order to maximise the amount of relevant data (metabolite spectra) that is collected. Our recent work led to the design of a Virtual Metabolomics Mass Spectrometer (ViMMS) in which we can run computer simulations of experiments and test different parameter settings (Wandy et al., 2019, 2022). Previously this has involved running a pre-determined set of parameters as part of a grid search in ViMMS, and then choosing the best parameter settings based on a single measure of performance. The proposed M.Res. (or Ph.D.) will extend this approach by using multi objective Bayesian Optimisation to adapt simulations and optimise over multiple different measurements of quality. By optimising parameters in this manner, we can help improve real experiments currently underway at the University of Glasgow and beyond.
Medical image segmentation and uncertainty quantification (PhD)
This project focuses on the application of medical imaging and uncertainty quantification for the detection of tumours. The project aims to provide clinicians with accurate, non-invasive methods for detecting and classifying the presence of malignant and benign tumours. It seeks to combine advanced medical imaging technologies such as ultrasound, computed tomography (CT) and magnetic resonance imaging (MRI) with the latest artificial intelligence algorithms. These methods will automate the detection process and may be used for determining malignancy with a high degree of accuracy. Uncertainty quantification (UQ) techniques will help generate a more precise prediction for tumour malignancy by providing a characterisation of the degree of uncertainty associated with the diagnosis. The combination of medical imaging and UQ will significantly decrease the requirement for performing invasive medical procedures such as biopsies. This will improve the accuracy of the tumour detection process and reduce the duration of diagnosis. The project will also benefit from the development of novel image processing algorithms (e.g. deep learning) and machine learning models. These algorithms and models will help improve the accuracy of the tumour detection process and assist clinicians in making the best treatment decisions.
Generating deep fake left ventricles: a step towards personalised heart treatments (PhD)
Supervisors: Andrew Elliott, Vinny Davies, Hao Gao
Relevant research groups: Machine Learning and AI, Emulation and Uncertainty Quantification, Biostatistics, Epidemiology and Health Applications, Imaging, Image Processing and Image Analysis
Personalised medicine is an exciting avenue in the field of cardiac healthcare where an understanding of patient-specific mechanisms can lead to improved treatments (Gao et al., 2017). The use of mathematical models to link the underlying properties of the heart with cardiac imaging offers the possibility of obtaining important parameters of heart function non-invasively (Gao et al., 2015). Unfortunately, current estimation methods rely on complex mathematical forward simulations, resulting in a solution taking hours, a time frame not suitable for real-time treatment decisions. To increase the applicability of these methods, statistical emulation methods have been proposed as an efficient way of estimating the parameters (Davies et al., 2019; Noè et al., 2019). In this approach, simulations of the mathematical model are run in advance and then machine learning based methods are used to estimate the relationship between the cardiac imaging and the parameters of interest. These methods are, however, limited by our ability to understand the how cardiac geometry varies across patients which is in term limited by the amount of data available (Romaszko et al., 2019). In this project we will look at AI based methods for generating fake cardiac geometries which can be used to increase the amount of data (Qiao et al., 2023). We will explore different types of AI generation, including Generative Adversarial Networks or Variational Autoencoders, to understand how we can generate better 3D and 4D models of the fake left ventricles and create an improved emulation strategy that can make use of them.
Regular seminars relevant to the group are held as part of the Statistics seminar series. The seminars cover various aspects across the AI3 initiative and usually span multiple groups. You can find more information on the Statistics seminar series page, where you can also subscribe to the seminar series calendar.
Recent innovations around generative AI models such as chatGPT have brought the fields of machine learning and AI to the forefront of modern research. Many of these methods are based on core statistical principles and methodology, and therefore there is a large interface between machine learning, AI, and statistics.
The Machine Learning and AI group works on several methods on this interface, with ongoing research projects using Generative Adversarial Networks (GANs), graph neural networks, and more traditional machine learning methods such as Gaussian Processes.