The Linguistic DNA of Modern Thought

Main image for Linguistic DNA projectThe Linguistic DNA of Modern Thought project is led by Professor Susan Fitzmaurice at the University of Sheffield, with other investigators based in the Humanities Research Institute (HRI) at Sheffield, in English Language at the University of Glasgow, and the School of English at the University of Sussex. In harnessing the data available in online repositories of historical texts in English, the project will also engage with experts in these media including the Early Modern OCR Project (eMOP) at Texas A&M University, Jisc Historic Books, and Gale Cengage.


Lexicalization Pressure

The University of Glasgow's part of this project is conducting a research theme on Lexicalization Pressure, which will be exploring the development of key terms through analysis of the volume and type of lexical items available for the concepts to which they are applied. The Historical Thesaurus of English organizes every word in the language’s history into a hierarchy based on their meanings; this unique resource therefore allows an investigation of the way concepts have been expressed across time, and is at the heart of Glasgow’s work on the Linguistic DNA (LDNA) project.

The lexicalization pressure theme will involve using data visualization techniques to identify areas of the Thesaurus' hierarchy which have seen significant changes within the 1500-1800 time period of the LDNA project. Our working hypothesis is predominately that sudden expansions in the vocabulary associated with a given semantic field are reflective of its increased cultural importance. Additionally, semantic fields which experience a rapid decrease in size are likely to be similarly important. It is possible that decrease shows the opposite of increase, and that the fields in question have lost cultural importance. However, it is also necessary to explore cases in which these decreases appear to result from a specialization of the language, when the lack of an extensive vocabulary for a concept can be an indication that it is so important that it has acquired a specific, semi-technical vocabulary. Within the areas of growth or loss of lexical items, we will also be exploring patterns of intra-category change, such as whether different parts of speech (eg nouns, adjectives, verbs) expand more or less readily than others, and how often one part of speech buds off related words into others. The project will study whether the rapidly lexicalized/delexicalized concepts coincide with the key, paradigmatic terms which are identified by the visualization of the EEBO data as developed by the HRI.

In parallel to work with Thesaurus data, the Lexicalization Pressure theme will study tranches of the EEBO-TCP and ECCO corpora, as is being done as part of the other research themes in LDNA. These text collections will be explored to find qualitative evidence supporting the quantitative data, demonstrating the patterns of lexical and semantic change in action in real texts.


The Linguistic DNA of Modern Thought project is funded by the AHRC (project AH/M00614X/1)


Linguistic DNA Project - Overall Structure and Goals

The Linguistic DNA project's aim is to use data extraction and visualization techniques to automatically identify the points in time at which particular words become paradigmatic for the concept which they express – for example, the adoption of civility as the principal term for expressing the idea of behaving politely towards others. Software being developed by the HRI will look for statistically significant keywords in text, the frequency of their use, words which collocate with them, and other markers of the network of semantic concepts in which they are embedded. Tracking the rise and fall of lexical items which constitute key terms in semantic fields will then allow exploration of the ways in which the lexis of English has reflected and participated in major cultural change. Glasgow’s focus is specifically on the processes and results of historical lexicalization pressure – the pressure exerted by a concept’s growing importance which forces language users to expand the vocabulary they have for discussing that concept.

The project is funded for three years, from April 2015 to March 2018. The Humanities Research Institute at Sheffield are engaged in pioneering programming which seeks to use query expansion techniques to identify clusters of words which collocate with terms which carry great cultural weight, such as freedom or civility. These investigations will be run on the EEBO-TCP and ECCO corpora (Early English Books Online and Eighteenth Century Collections Online, respectively), sources which provide the most comprehensive coverage currently available of the literature of the 16th to 18th centuries.

In part, the HRI's work will build on the output of the SAMUELS project, a collaboration between the Lancaster University and the University of Glasgow to produce semantic annotation software capable of disambiguating the senses of words in a text and labeling each with a code based on the position of that sense in the hierarchy of the Historical Thesaurus of English. One of the major outputs of this project was a semantically-tagged copy of the EEBO-TCP corpus. The Linguistic DNA project will extend this tagging to the ECCO corpora (i.e. both the TCP-keyed section, and the texts which are being processed by the eMOP project). Use of these pre-tagged corpora allows ready identification of the instances in which a word has a sense of interest to the researchers, as well as the ability to automatically discover words in the surrounding text which are in the same or other relevant semantic areas.

Lexicalization Pressure is one of three research themes of the Linguistic DNA project, each co-ordinated by a different institution. The other two themes are Contexts of Semantic Change (University of Sheffield) and Lexical Families and Conceptual Fields (University of Sussex). For more information on these, please see the overall project webpage.


Lexicalization Pressure

Professor Marc Alexander (University of Glasgow – LDNA Co-Investigator)
Dr Fraser Dallachy (University of Glasgow – Research Associate)
Brian Aitken (University of Glasgow – Digital Humanities Research Officer)


Linguistic DNA

Professor Susan Fitzmaurice (University of Sheffield – LDNA Principal Investigator)
Michael Pidd (HRI, University of Sheffield – LDNA Co-Investigator)
Dr Justyna Robinson (University of Sussex – LDNA Co-Investigator)
Dr Iona Hine (University of Sheffied – Research Associate)
Dr Seth Mehl (University of Sheffield – Reseach Associate)
Katherine Rogers (HRI, University of Sheffield – Digital Humanities Developer)
Matthew Groves (HRI, University of Sheffield – Digital Humanities Developer)