College of Science & Engineering

Generative AI in research software: Comparing AI-generated and human-written geospatial research software

Supervisor: Mr Joseph Shingleton

School: Geographical and Earth Sciences

Description:

Background and motivation

Research software is increasingly written with the aid of generative AI tools. This presents both risks and opportunities for the scientific community – on the one hand, increased productivity and reduced technical barriers for entry may expedite scientific discovery, on the other, disrupting the relationship between coder and code may threaten the integrity of research software. This risk is particularly evident in specialist fields where domain-specific methods, standards, and workflows may be less well represented in an AI system’s training data.

Geospatial information science is one such field, with its own technical requirements around spatial data formats, coordinate systems, reproducibility, and analytical workflows. This project will investigate how effectively generative AI can reproduce geospatial research software by building a comparative dataset of human-written and AI-generated code.

Expected activities

The student will work with a preselected collection of real geospatial research code repositories developed before the widespread use of AI coding tools, and use an AI coding assistant (e.g. OpenAI’s “Codex”, or Anthropic’s “Claude Code”) to recreate selected tools or functions from the code. They will then compare the original and generated code, documenting key similarities and differences in areas such as dependencies, coding patterns, clarity, reproducibility, and handling of geospatial tasks. Through this project, they will begin to identify the distinctive patterns unique to AI-generated geospatial research software.

Outputs

A key output of this project will be a pilot benchmark dataset enabling pairwise comparison of human-written and AI-generated geospatial research software. In addition to this, the student will help develop a structured framework for comparing these outputs in terms of reproducibility, coding patterns, dependencies and domain-specific correctness. This will provide an initial evidence base for further research on the role of generative AI in research software engineering, and may inform future work on benchmarking, best practices, and the responsible use of AI in geospatial science.

 Suitability and Development Opportunities

This project would suit a student in geographical or computational sciences with an interest in computational methods, research software, and geospatial data science. Relative proficiency in at least one programming language (ideally Python) is essential. The student will gain valuable experience in AI-assisted coding, research software evaluation, and critical assessment of computational workflows. These are crucial skills for anyone entering the software engineering workspace in the era of Generative-AI.