MyGlasgow

Research Software Management Guidelines

Research software is an emerging element of open science practice. Effective research software management (RSM) is required to ensure that research software and the research that it supports are of good quality, and that the software is safe, secure and remains valuable for the long-term. These goals are in keeping with FAIR principles. Where research software plays a central role in generating and analysing data, good management of research software is critical to ensure published results can be reproduced. This section is intended to complement a more formal Research Software Policy [work in progress] and the University's Publishing and Sharing Research Software document.

FAIR principles applied to research software are listed below, based strongly on Barker et al, 2022. It is often a requirement of funding bodies that research outputs including software adopt these principles. 

  • Findable: Software, and its associated metadata, is easy for both humans and machines to find. A large part of this is the use of globally unique and persistent software identifiers and the provision of rich metadata such as in a CITATION.cff file.
  • Accessible: Software, and its metadata, is retrievable via standardised protocols.
  • Interoperable: Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards. Software reads, writes and exchanges data in a way that meets domain-relevant community standards. Software includes qualified references to other objects.
  • Reusable: Software is usable (executable) and reusable (understandable, modifiable, extendable, or integrable). Software is described with a plurality of accurate and relevant attributes: It is given a clear and accessible license and is associated with detailed provenance. Software includes qualified references to other software. Software meets domain-relevant community standards.

Copyright

Research software falls under copyright law. It is the code rather than its functionality that is copyrighted. Anyone can take your copyrighted Python code and rewrite it in Java without infringing copyright. The University owns research software developed by any staff at the University of Glasgow and copyright must be assigned to the University of Glasgow regardless of whether your research software will be under an open source or proprietary licence. Place the copyright notice in comments in the header of all source files following the format: Copyright © [year of first publication] [copyright owner]. For example: Copyright © 2026 University of Glasgow.

 

Definition

Research software includes source code files, algorithms, scripts, computational workflows and executables that were created during the research process or for a research purpose. Software components (e.g., operating systems, libraries, dependencies, packages, scripts, etc.) that are used for research but were not created during or with a clear research intent should be considered software in research and not research software (Gruenpeter et al, 2021). The authors note that this differentiation may vary between disciplines.

Typical research software includes code for:

  • analysing data,
  • conducting an experiment,
  • web, mobile and desktop applications that are used to collect, manage, analyse or visualise research data.

GitHub-Zenodo

Code from inception should be located in code repositories managed by platforms such as GitHub or GitLab. Use of the University of Glasgow’s Enterprise GitHub instance is recommended. Repositories are either public or private. One of the key decisions you need to make at the start of the project is around visibility. If you go down the open-science OSS route then your repository should be under the University of Glasgow's public GitHub organisation. This will also enable publishing your code using Zenodo. If you need to keep your code private then your repository should be under the University of Glasgow's internal GitHub organisation

As well as being a place to store your code and facilitate development by a team, GitHub recognises that software development is an incremental process and software is often fluid, undergoing change (new versions) to deal to bug fixes, dependency updates or enhancement. The version(s) of code used to generate and/or analyse data used in a publication may be critical to enable replication of results, so once your code is complete create a tagged and versioned release to be used for data generation and/or analysis. If possible and relevant re-analyse your data with the latest version prior to publication or be explicit about which data were analysed with each version.

For OSS and making research software FAIR it is important to recognise that code in GitHub is not immutable but once in Zenodo the version becomes immutable. Zenodo will also provide a DOI (persistent identifier) for your software.

As with research data the University requires that all research software must be preserved for at least 10 years after the date of the scientific publication. If the funding body requires a different preservation period use the longer of the two specifications. There could be ethical or legal obligations to initially choose a longer preservation period, such as for clinical trial research. After this initial preservation period, the value and use of the software can be evaluated and the preservation period extended if necessary. 

Licensing

Research software must include an appropriate software license. A license can either be open source or proprietary. If the open source route is suitable the Open Source Initiative provides access to a range of open source software (OSS) licences, such as the GNU General Public Licence (GPL), Apache Licence, BSD 3-clause, and the MIT Licence. Where possible you should select one of these standard licences rather than creating a bespoke open source licence. You must consider all licences associated with libraries and packages utilised within the research software as these can impact on the correct license to use.

When using open source libraries as part of your research software there is a fair degree of compatibility between the licence under which that library sits and a standard open source license to be applied to the new research software. For instance an MIT license can be used for research software that utilises the Python pandas library that is licensed under a BSD 3-clause license. The main exception to this rule applies to the creation of OSS that utilises libraries licensed under a copyleft GPL license, in which case the terms of the GPL apply to the entire combination and your research software must be licensed under the GPL. In addition, any specific requirements of the body that is funding the research should be taken into account when deciding on the most appropriate license. If there are no specific stipulations from funding bodies or copyleft licence considerations, its hard to go wrong with an MIT licence if you want to make your research software open source.

Place the full text of your licence in a LICENCE.txt file and place that file in the root folder of your repository. Also refer to the type of licence in the CITATION.cff file in the root folder of your repository and include a brief license notice and a link to the full license within comments in the header of all source files.

Metadata

Your research software should be described with sufficient metadata. An example of software metadata might be certain content within a ReadMe file commonly seen in GitHub repositories, although there are a number of alternative or supplemental approaches. The recommendation here is to use a CITATION.cff file. Its both human-readable (like the ReadMe file) but with a defined vocabulary, semantically rich enough to be machine readable.

Metadata are both intrinsic (held within the repository) and extrinsic (held out with the repository such as within Zenodo). Intrinsic metadata would be the CITATION.cff file located in the root directory of your repository. The file must contain the required key-value pairs cff-version, title, message and authors but should also contain optional fields to more completely describe your software such as version (software version), licence, repository-code (link to GitHub), abstract, and keyword information. The machine-readability of CITATION.cff to encapsulate metadata comes into play when you deposit your software with Zenodo. The extrinsic metadata record should be handled automatically with the GitHub-Zenodo integration, meaning there is one less thing for you to worry about.

The Turing Way Community provide an excellent guide on Software Citation with CITATION.cff.

Open Source Software

Wherever possible, and in the spirit of open research, the University of Glasgow encourages the creation of open source software (OSS). When making software available to others there should always be a balance between the level of openness and access granted, possible commercial exploitation of software and any legal restrictions, such as those imposed by a funding body, or dual-use, economic or ethical concerns, that prevent making that software open source. The general principle is that research software should be made available as open as possible, as closed as necessary.

Dual use concerns apply to software with clear societal benefits that can also be utilised for alternate potentially malicious purposes. A classic example is software to control the use of drones that can be both beneficial but also harmful if that drone is directed to disrupt air traffic. Economic concerns can apply to research software aligned to certain core programmes such as in the competitive field of quantum computing where there is a strong economic drive to be first. In these cases such restrictions on OSS will likely be stipulated as a condition in the grant. Lastly anything dealing with storing or handling personally identifiable data is subject to strict regulations provided under the General Data Protection Regulation (GDPR).

Software Engineering Best Practice

This is not intended to be a complete guide to software engineering best practice but hopefully covers the most important elements.

Agile

It is difficult to accurately and completely specify the final state of research software at the start of the project, so its important to recognise that software development is an iterative process and use a methodology that embraces change and the cyclical nature of software development with continuous improvement through regular feedback. Agile emphasises flexibility, collaboration, and customer satisfaction, delivering working software in small, frequent increments to arrive at a product that meets the user’s needs rather than one large release that may not be fit for purpose. The RCaaS RSE team uses SCRUM.

Gathering Requirements

Requirements gathering is one of the key challenges of software engineering. Poor requirements will lead to poor software. RSEs and researchers have their own areas of expertise and at the extreme dialogue can feel like you are talking different languages. Glossaries and avoidance of domain jargon and acronymns can help here. An agile iterative approach and rapid prototyping will also help refine requirements. Requirements documents are modifiable documents so must be versioned as requirements change throughout development.  

Requirements documents can be a set of user requirements and/or more technically specified system requirements. An example of a user requirement for a website that is used to collect observations would be “As a researcher, student or citizen scientist I want the system to optionally record my user information so that I can see just the data that I contributed”. The same requirement in a systems requirement would read more like “The system must capture the username and password (encrypted) in a user table. The user table primary key must be used as a foreign key in the observations table, with a one to many relationship between user and observation tables”.

Branching

Good organisation of your repository is important and this becomes especially critical when more than one person is involved in development. It is good practice to develop new features in a branch, and once complete, tested and reviewed those features can be merged into the main branch. In this way your main branch is isolated allowing you to regularly commit code to the branch even if it is incomplete or contains bugs.

Coding

Code consistently according to recommended guidelines where applicable, e.g., PEP8 for python development. Use linters such to enforce standards, e.g., PyLint for Python code.

Testing

Thorough testing is critical to writing code that can be trusted. Create unit tests to cover as much code as possible and use a suite of regression tests to test updates. Additionally perform acceptance/system tests to ensure that all requirements (that are testable) have been met.

Review

Ensure linted and tested code is reviewed by other experienced RSEs and document the outcomes of the review.