School of Mathematics & Statistics

Engineering the Biodiversity Intactness Index: Scalability is not just cores and memory!

Connor Duffin (The Natural History Museum)

Wednesday 4th March 12:00-13:00
Maths 311B

Abstract

The Biodiversity Intactness Index (BII) developed by the Biodiversity Futures Lab (BFL), at the Natural History Museum, has evolved from a research project to a widely used dataset in academic, policy, and commercial settings. Initially released as a 10x10 km-resolution single-year map, commercial demands have required the scaling of the BII to a 1x1 km 24-year timeseries. This has presented a unique problem: how do you transform a research codebase to a scalable computational pipeline that runs on HPC?

In this talk, I will describe how we made this happen. I’ll detail the processes of refactoring the R codebase into a set of internal packages, and automation with the R package {targets}. Our migration to high-performance computing (HPC) will be covered, as well as how the BFL team now develops code according to practices that, though widespread in open-source software development, are not yet widely used in data science research groups. We will additionally discuss how we were able to open-source some components of our codebase. A key theme of this talk is scalability: not only in compute requirements, but in how processes can be put in place to enable development to scale up within the team. This seminar aims to provide some ideas for researchers looking to improve their software and data engineering processes, along with some pointers to good tools and practices that we’ve found useful in BFL.

Add to your calendar

Download event information as iCalendar file (only this event)