The Glasgow Information Retrieval Group within the School of Computing Science at the University of Glasgow was founded 32 years ago in 1986 by Professor C. J. ‘Keith’ van Rijsbergen, often considered one of the founders of modern Information Retrieval (IR). From its outset, the Glasgow IR group has focused on improving the effectiveness of IR systems, inventing new logic & probabilistic retrieval models in the 90's and early 2000's, followed by the development of adaptive query expansion techniques, interactive multimedia models, the Divergence From Randomness framework, as well as leading research into quantum, expertise search and search result diversification models in the late 2000's. Since then, the Information Retrieval group embraced emerging machine learning and deep learning technologies for very large corpora and data streams, and have been at the forefront of research, development and application of those technologies for search and recommendation use-cases in a manner that ensures both effectiveness and efficiency.
The Glasgow IR Group has a strong research track record. Indeed, the ACM Digital Library shows that the group is ranked first by number of papers (429) at the SIGIR conference (the top CORE A* conference in the IR field). Meanwhile, a recent study by Microsoft Research of the 40 years of SIGIR showed the University of Glasgow as the 5th most cited university at the conference and the 1st in Europe. The group is also renowned for developing the popular open source IR platform, Terrier.org, which has been downloaded over 60,000 since its first release in 2004 and is cited by over 3500 research papers. Furthermore, the group has a long history of engagement with the public and industry sectors from small SMEs to multinational corporations.
The Informer magazine of BCS's Information Retrieval Specialist Group carried a recent profile on the Glasgow Information Retrieval Group.
As the most active Information Retrieval group by publications in Europe and one of the longest running, our research covers the full-spectrum of topics that are relevant to the development of IR systems:
IR & Recommender Systems Models
- Theoretical modelling of IR systems
- Machine learning and deep learning for information retrieval and recommender systems
- Interactive information retrieval (personalised IR, emotion based search, user modelling for IR, gestural IR)
- User modelling and personal information access
- Topic modeling; Entity search; Natural language processing for IR
- Recommender systems; Context-aware venue suggestion
Large-scale IR & Efficient IR
- Web information retrieval; Big data and information retrieval
- Efficient architecture for large-scale IR systems; Data stream processing architectures
Data Streams & IR
- Real-time information retrieval
- Search in social and sensor networks
Artificial Intelligence & IR
- Conversational information seeking and dialogue systems
- Information credibility, transparency, explainability and verification in IR systems
- Fairness in information retrieval & recommender systems
Natural Language Processing & IR
- Information extraction including entity and relation extraction
- Automatic knowledge graph construction
- Multi-task models, joint models and summarization
- Multimedia information retrieval
- Domain-specific information retrieval: smart cities; health; news; eDiscovery; sensitivity review
- Emergency management and crisis informatics
- Politics and Media
- Test collections and evaluation metrics
- Evaluation of IR systems and crowdsourcing for IR
- Online and Offline Evaluation of IR and Recommender Systems
- Eye-tracking and physiological approaches, such as fMRI
Current staff and students
Current Research Assistants and Research Students:
- Javier Sanz-Cruzado Puig
- Ting Su
- Xi Wang
- Xiao Wang
- Siwei Lu
- JingMin Huang
- Yashon Wu
- Xin Xin
- Carlos Gemmel
- Federico Rossetto
- Sarawoot Kongyoung
- Alexander Hepburn
- Ian Mackie
- Jun Choi Hyun
- Hitarth Narvala
- Jijun Long
- Maria Vlachou
- Sasha Petrov
- Thomas Janich
- Zixuan Yi
- Erland Frayling
- Jarana Manotumruksa (2019), University College London, Researcher
- Anjie Fang (2019), Amazon, Applied Scientist
- Jorge David Gonzalez Paule (2019), Jobandtalent Espana, Data Scientist
- Colin Wilkie (2019), Siemens, Data Engineer
- David Maxwell (2019), University of Deft, Data Engineer
- Graham McDonald (2019), University of Glasgow, Lecturer
- James McMinn (2018), ScoopAnalytics, Co-Founder
- Stuart Mackie (2018), BiP Solutions/Strathclyde Uni, Data Scientist
- Horatiu Bota (2018), Prodsight, Data Scientist
- Jesus Alberto Rodriquez Perez (2018), University of Glasgow, Postdoctoral Researcher
- Fajie Yuan (2018), Tencent, Senior Researcher
- Ryen White (Research Manager, Microsoft Research AI)
- Mark Sanderson (Professor, Royal Melbourne Institute of Technology)
- Mounia Lalmas (Head of Tech Research, Spotify)
- Ian Ruthven (Professor, Strathclyde University)
- Fabio Crestani (Professor, University of Lugano)
- Vassilis Plachouras (Software Engineering, Facebook)
- Leif Azzopardi (Chancellor's Fellow, Strathclyde University)
- Rodrygo Santos (Assistant Professor, Federal University of Minas Gerais)
- Eugene Kharitonov (Research Engineer, Facebook)
- Saul Vargas (Senior Machine Learning Scientist, ASOS)
- Dyaa Albakour (Lead Data Scientist, Signal Media)
- Nut Limsopatham (Senior Researcher, Microsoft AI)
- Amir Jadidinejad (AI Engineer, Glaxo Smith Kline)
- Zaiqiao Meng (Researcher, Cambridge University)
Terrier IR platform
Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents developed by the IR group. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Indeed, Terrier is used internationally, with over 60,000 downloads since its first release in 2004. Terrier is is used widely by the research community, with over 3700 citations in research papers according to Google Scholar.
Visit the website at http://terrier.org to learn more and download Terrier for free.
For those new to the Information Retrieval field, the group maintains a useful set of common resources for researchers and practitioners:
- Information Retrieval Test Collections: On this page are a list of publically available IR test collections. Some are held locally and some are pointers to remote sites.
- Collections of text and corpora: What's the difference between a test collection and a text collection? Well a test collection has to have associated queries and relevance judgements. The things in here are simply document collections.
- Language reference works: This page contains links to online language reference works, such as dictionaries, thesauri etc.
- IR systems: A list of links to some sites that have information about IR systems.
- Linguistic utilities: Bits of IR language related utilities like stemmers, stop words lists, morphological taggers, etc.
- IR Journals: Various table of contents and abstracts of the papers in a number of well known IR journals.
- IR Organisations: Various IR groups and more formal organisations.
- Books: Supplements of books or whole books online.
Considering the impact of recommendations on item providers is one of the duties of multi-sided recommender systems. Item providers are key stakeholders in online platforms, and their earnings and plans are influenced by the exposure their items receive in recommended lists. Prior work showed that certain minority groups of providers, characterized by a common sensitive attribute (e.g., gender or race), are being disproportionately affected by indirect and unintentional discrimination. However, there are situations where (i) the same provider is associated with multiple items of a list suggested to a user, (ii) an item is created by more than one provider jointly, and (iii) predicted user-item relevance scores are biasedly estimated for items of provider groups. In this talk, we assess disparities created by the state-of-the-art recommendation models in relevance, visibility, and exposure, by simulating diverse representations of the minority group in the catalog and the interactions. Based on emerged unfair outcomes, we devise a treatment that combines observation upsampling and loss regularization, while learning user-item relevance scores. Experiments on real-world data demonstrate that our treatment leads to lower disparate relevance. The resulting recommended lists show fairer visibility and exposure, higher minority item coverage, and negligible loss in recommendation utility.
Ludovico Boratto is a researcher at the Department of Mathematics and Computer Science of the University of Cagliari (Italy). His research interests focus on recommender systems and their impact on the different stakeholders, both considering accuracy and beyond-accuracy evaluation metrics. He has authored more than 60 papers and published his research in top-tier conferences and journals. His research activity also brought him to give talks and tutorials at top-tier conferences and research centers (Yahoo! Research). He is editor of the book “Group Recommender Systems: An Introduction”, published by Springer. He is an editorial board member of the “Information Processing & Management” journal (Elsevier) and “Journal of Intelligent Information Systems” (Springer), and guest editor of several journals’ special issues. He is regularly part of the program committees of the main Web conferences, where he received three outstanding contribution awards. In 2012, he got his Ph.D. at the University of Cagliari (Italy), where he was a research assistant until May 2016. From May 2016 to April 2021, he joined Eurecat as Senior Research Scientist in the Data Science and Big Data Analytics research group. In 2010 and 2014, he spent ten months at Yahoo! Research in Barcelona as a visiting researcher. He is a member of ACM and IEEE.
In the last decade, deep learning advancements have boosted the development of many neural solutions for effectively analyzing biomedical literature—widely accessible through repositories such as PubMed, PMC, and ScienceDirect. Large pre-trained language models (PLMs) have become the dominant NLP paradigm, achieving unprecedented results in a panoply of tasks, from named entity recognition and semantic parsing to information retrieval and document summarization. However, the latest batch of research has highlighted several weaknesses of PLMs, including a black-box knowledge limited by weight matrices' dimensions and the scarce ability to separate discrete semantic relations from surface language structures.
This talk presents two papers riding different promising trends to solve these issues and draw a complementary path to architectural scaling: (i) equipping PLMs with the ability to attend over relevant and factual information from non-parametric external sources; (ii) infusing semantic parsing graphs into PLMs.
Specifically, in (i) we will see a T5 model empowered by differentiable access towards a large-scale text memory grounded on PubMed, while in (ii) we will explore a BART model for biomedical abstractive summarization augmented by event and AMR graphs, as well as a semantic-driven reinforcement learning signal.
Giacomo Frisoni is a second-year Ph.D. student with competencies in Natural Language Understanding and Neuro-Symbolic Learning. He has a Bachelor's and Master's degree in Computer Science and Engineering from the University of Bologna, both with honors. He presented several original papers to journals and international peer-reviewed conferences—including top-tier venues like COLING, winning two Best Paper Awards. He participated in the Cornell, Maryland, Max Planck Pre-doctoral School 2020. In June 2022, he was selected as a member for the first HuggingFace Student Ambassador program.
IR Group in a nutshell
- #1 Information Retrieval group in Europe (ACM SIGIR publications)
- Creator of world-famous Terrier.org IR platform
- Leader in next generation Big Data processing technologies
- Leading international data challenges (TREC CARS, TREC Incident Streams)
- Driving innovate intelligent systems for the home, public and commercial sectors