SAMUELS Project Outputs

The main outputs of the SAMUELS project are the Historical Thesaurus Semantic Tagger itself and the two semantically annotated corpora, Semantic Hansard and Semantic EEBO. However, the project has resulted in a number of presentations, publications, and outreach events, and the data has been used in subsequent projects.

Below the main tools and outputs section is presentation material from the final meeting for the project, held as a symposium for researchers in the SAMUELS consortium at the University of Glasgow on 26th-27th March 2015. These are indicative of the stage of sub-project research at the official close of the project.

Further presentation and publication outputs are listed below this which were undertaken by members of the consortium either directly in relation to the project, or employing the HTST in their methodology.

Final Project Meeting and Future Plans

A two-day meeting was held to conclude the SAMUELS project on the 26th-27th March 2015. This allowed the entire team to discuss the methodologies they had employed for their work, present results of their research, and feed back on use of the tagger and the tagged data it provided. Powerpoint presentations representing the different aspects and strands of the project are available below.

Possibilities for future work were discussed at the meeting, with the consortium planning to build on the research and results achieved during the SAMUELS project. Further research at Glasgow had already begun to use Semantic EEBO as part of the 'Linguistic DNA' project, led by Professor Susan Fitzmaurice at the University of Sheffield. Glasgow and Lancaster aim to evolve the function of the tagger to recognise higher-level linguistic features, whilst the teams at Huddersfield and UCLAN are using the data in their ongoing investigation of the language of labour relations and aggression respectively.


Final Project Meeting Presentations

Lancaster: HTST Evaluation Notes - Scott Piao

Glasgow: Update on Glasgow Input to HTST - Fraser Dallachy

Huddersfield: Is There a Baron in the Commons? Project Update - Lesley Jeffries, Brian Walker, Jane Demmen

UCLAN: Tracing Verbal Aggression over Time - Dawn Archer, Bethan Malory

Tools and Corpora

The SAMUELS consortium has produced a considerable amount of output connected to the work conducted for the project. The most important of these is the Historical Thesaurus Semantic Tagger (HTST) itself, which can be downloaded from the link below.


The Historical Thesaurus Semantic Tagger (HTST)

The semantic tagger can be accessed via the downloadable graphical user interface (GUI). This allows the user to input text which is then sent to Lancaster University servers to be processed using the HTST pipeline. The zipped file contains a readme text which advises the user on how to run the GUI. From spring 2021 it will also be available on the UCREL API server. The online and GUI version is suitable for use on small to medium-sized bodies of text (up to c. 100,000 words). For larger texts it is advisable to contact Professor Paul Rayson at UCREL to discuss the use of their servers for the tagging process.


Guide to Using HTST


Thematic Category Set for the Historical Thesaurus of English

A new category set was developed for use in the project which can be overlaid on the full Historical Thesaurus of English category set. This 'human scale' set of categories collapses category distinctions which are too general or too specific to be of value to most users of the data. Users, therefore, can more rapidly find relevant concepts in the approximately 4,000 headings of the thematic category set than was possible in the full set of around 235,000 Historical Thesaurus categories. Thematic categories have also proven valuable as levels of data aggregation in the tagger pipeline and for subsequent work on projects such as Linguistic DNA.

Introduction to Thematic Categories

Thematic Categories


Semantically Annotated Corpora

The Semantic Hansard and Semantic EEBO corpora are now available via Professor Mark Davies' platform.

Semantic Hansard may also be explored via the Hansard at Huddersfield project's web interface. The tagged files of Semantic EEBO texts (EEBO-TCP Phase I) are available for individual download from the Oxford Text Archive.

Conference Papers and Presentations

DH 2014, Lausanne – Metaphor, Popular Science, and Semantic Tagging. Marc Alexander, Jean Anderson, Fraser Dallachy, Christian Kay, Scott Piao, Paul Rayson (July 2014)

DHC 2014, Sheffield – Developing the Historical Thesaurus Semantic Tagger. Scott Piao, Fraser Dallachy, Alistair Baron, Paul Rayson, Marc Alexander (September 2014)

Europeana Cloud Workshop, The Hague – Historical Linguistics/Psychology presentation. Paul Rayson (December 2014)

Syphilis Symposium 2015, Glasgow – “Dear was the Conquest of a new found World”: Digital humanities and the language of syphilis. Marc Alexander (January 2015)

CILC 2015, Madrid – Big Data Challenges with Big Corpora and Big Taxonomies. Paul Rayson (March 2015)

ICAME 2015, Trier – Large-scale time-sensitive semantic analysis of historical corpora. Paul Rayson, Alistair Baron, Scott Piao, and Steven Wattam (May 2015)

Political Discourses: Multidisciplinary approaches, UCL – Investigating the lexis of labour relations in UK House of Commons debates over time: a study of parliamentary language using corpus linguistic methods and automated semantic tagging. Jane Demmen, Lesley Jeffries, Brian Walker (June 2015)

PALA 2015, Kent – “flat and insipid, damp’d and extinguish’d, bitter’d and poison’d”: Insipidity and Taste in Early Modern English. Marc Alexander, Fraser Dallachy (July 2015)

PALA 2015, Kent – Is there a Baron in the Commons? The lexis of labour relations in parliamentary language across time. Jane Demmen, Lesley Jeffries, Brian Walker (July 2015)

CL 2015, Lancaster – Semantic Tagging and Early Modern Collocates. Marc Alexander, Alistair Baron, Fraser Dallachy, Scott Piao, Paul Rayson, Stephen Wattam (July 2015)

CL 2015, Lancaster – Tracing Verbal Aggression over time, using the Historical Thesaurus of English. Dawn Archer & Bethan Malory (July 2015)

ICLC13, Newcastle (July 2015)
• ‘The Lexis of Labour Relations in Hansard across Time: Perspectives from the HTE’. Jane Demmen, Lesley Jeffries, Brian Walker
• ‘Mapping Aggression over Time Using the Historical Thesaurus of English’. Dawn Archer, Bethan Malory
• ‘Populating Input Spaces: Conceptual blending and the Historical Thesaurus of English’. Marc Alexander, Fraser Dallachy


Outreach Events

Stall at Loncon 3, The 72nd World Science Fiction Convention (London; August 2014)

Stall at Explorathon, European Researchers’ Night (Glasgow; September 2014)


Journal Articles

Alexander, Marc, Fraser Dallachy, Scott Piao, Alistair Baron, Paul Rayson (2015). ‘Metaphor, Popular Science and Semantic Tagging: Distant reading with the Historical Thesaurus of English’, Digital Scholarship in the Humanities (DSH) (30:suppl_1) 

Archer, Dawn, Merja Kytö, Alistair Baron, Paul Rayson (2015). ‘Guidelines for Normalising Early Modern English Corpora: Decisions and justifications’. ICAME Journal (39:1)

Archer, Dawn & Bethan Malory (2017). 'Tracing Facework Over Time Using Semi-automated Methods'. International Journal of Corpus Linguistics (22:1)

Archer, Dawn (2017). 'Mapping Hansard Impression Management Strategies through Time and Space'. Studia Neophilologica (89:sup1)

Archer, Dawn (2018). 'Negotiating Difference in Political Contexts: An exploration of Hansard'. Language Sciences (68)

Piao, Scott, Fraser Dallachy, Alistair Baron, Jane Demmen, Steve Wattam, Philip Durkin, James McCracken, Paul Rayson, Marc Alexander (2017). 'A Time-Sensitive Historical Thesaurus-Based Semantic Tagger for Deep Semantic Annotation'. Computer Speech & Language (46)

Details of publications following the project's conclusion can be found on the staff pages of individual team members, linked from the Project Team page.