Digital Humanities Data Hive (DH2)

The Digital Humanities Data Hive (DH2) is a proposed national, interactive data centre for the arts and humanities, where diverse datasets from a variety of disciplines can be collected, connected, and explored in innovative ways. DH2 will consist of two key elements: a Data Service, which will integrate new and existing data repositories into a centralised resource, and a Data Lab, where a new analytical layer of tools will enable the manipulation, mining, re-use, and visualisation of these materials. The project builds on the team’s experience research working with humanities data that can be transformed and enriched through the data-driven integration of research data and tools, including digital text, image, multimedia, and multimodal collections.

The Team 

The team is led by Principal Investigator, Prof Lorna Hughes (Professor in Digital Humanities, Information Studies, School of Humanities, University of Glasgow).

  • Dr Guyda Armstrong (Co-I, Senior Lecturer in Italian Studies, University of Manchester) 
  • Prof Marc Alexander (Co-I, Professor of English Linguistics, University of Glasgow). 
  • Hannah Barker (Co-I, Director of the John Rylands Research Institute, University of Manchester)
  • Goran Nenadic (Co-I, Professor of Computer Science, University of Manchester) 
  • Dr Riza Batista-Navarro (Co-I, Lecturer in Text Mining, University of Manchester)
  • Mike Pidd (Co-I, Director of The Digital Humanities Institute, University of Sheffield) 
  • Matt Groves (Research Software Engineer, Digital Humanities Institute, University of Sheffield)
  • Dr Arthur Clune (Co-I, Digital Humanities Institute, University of Sheffield)
  • Dr Ewan Hannaford (Research Assistant, University of Glasgow)
  • Dr Diane Scott (Research Assistant, University of Glasgow)

Aims & Objectives

Our vision is to scope the establishment of what we call DH2: the Digital Humanities Data Hive. This is conceived as an active and interactive national data centre for the arts and humanities where the rich and complex data at the heart of research can flourish in new and unexplored ways. At the heart of the project is a respect for the complex, hybrid and diverse array of data that are the basis for arts and humanities research. Our proposal rejects narrow definitions of data types and disciplines, and instead builds on the UK’s long history of expertise in digital text, image, and multimedia collections; within our scope is any humanities data which has the potential to be transformed and have value added through the data-driven integration of research data and tools, to enable completely new ways of doing digitally enabled research.

The aim of the scoping study will be to:

  1. identify the gaps in the landscape of data repositories and digital scholarship for arts and humanities research;
  2. understand the value, challenges, barriers, opportunities, and costs (start-up and long-term) of our vision for addressing the gaps; and
  3. establish the approach, feasibility, specification, resources, and risks for implementing the vision in practice.


Thirty years of funding for arts and humanities projects with data outputs has clearly established the need to bring together the key issues of sustainability, preservation, management, use, re-use and exploration of data. Internationally, the establishment of research infrastructures for the arts and humanities has developed a range of potential models. But there have been very few projects that have successfully developed approaches to create, manage, and use/re-use data.  Further, the promotion of data sharing and re-use – in the form of institutional or subject repositories – has been limited in scope and focus due to the complex nature of the accessioned datasets, which generally require specialised and regrettably siloized management. The varieties of media, formats, and data models represented in the humanities historically have made it difficult to provide end-users services that are beneficial to research. The current infrastructure obscures the value of this data, placing barriers in the way of its use and reuse, and therefore reducing the return on the investment of time and money in the creation of the data. With the excitement generated by emerging AI/ML and big data approaches, this problem will become even more acute as humanities and data science increase their appetite for working with larger, more complex, and more tightly integrated data sources.


The Digital Humanities Data Hive will develop a detailed project plan that lays out its feasibility and high-level specification and addresses its value, challenges, barriers, opportunities, sustainability and costs.