Machine learning in spectroscopy

Stock photo diagram of mosquito and disease transmission

Keywords – chemistry, spectroscopy, machine learning, entomology, disease vectors

Project Summary - We will combine expertise in chemistry, spectroscopy, entomology, and computing science to apply state-of-the-art machine-learning techniques to the determination of traits in insects and the design of novel molecules for attracting or repelling insects.

In the battle against the spread of diseases such as malaria and Zika, it is critically important to be able to monitor the distribution of ages, species, and other traits of the population of vector species that transmit disease.  As a key example, malaria can only be transmitted by mosquitoes older than 10 days. Therefore, control efforts should focus on reducing the fraction of older mosquitoes. The current best methods for doing this are highly inaccurate or expensive.

We have been able to demonstrate in preliminary work on mosquitoes that the mid-infrared spectrum contains sufficient information to determine age and species when analysed using a simple neural network. In this project, much more complete and robust analysis will be developed using supervised machine learning using more extensive spectral data sets. We will use dimensionality-reduction techniques for gaining greater insight into what spectral data are most important, we will use different forms of data, and generate synthetic data to improve robustness. Additionally, we will add near- infrared spectral data to allow the machine-learning algorithms to discover additional correlations. The experiments will be carried out on mosquitoes reared in Glasgow and at the Ifakara Health Institute in Tanzania as well as ticks from Scotland.

The initial work on application of machine learning tools in a fairly standard approach will give the student a firm foundation, preparing them for exciting advanced work on graph-convolutional autoencoders to produce a data-driven continuous representation of molecules. We already have a machine-learning model trained on a database of 500,000 SMILES representations of molecules from pubchem. Preliminary work has shown that the attractiveness of a molecule to mosquitoes can be quantified semi-automatically on a greatly parallel scale, which will be exploited to find novel molecules that repel or attract insects. This is a potentially disruptive technology with wide applicability to molecular design.

Project Team - The project will be led by Prof Klaas Wynne in the School of Chemistry and co-supervised by Prof Roderick Murray-Smith in the School of Computing Science and Dr Francesco Baldini in the Institute of Biodiversity Animal Health and Comparative Medicine. The student will work among the three groups and primarily be based in Chemistry. The supervisors will hold regular meetings with the student to review the project’s progress and also to provide supports as required in order to meet the anticipated project goals in time. S/he will have access to the facilities available in three research groups and will also benefit from a highly active research culture of working in the interdisciplinary team.