Multi-Lingual and Multi-Modal Location Information Extraction (Multi-LM)

Supervisor: Joseph Shingleton

Industry Partner: The Alan Turing Institute and DSO National Laboratories, Singapore

School: Geographical & Earth Sciences

Description:

Project Overview 

The Multi-Lingual and Multi-Modal Location Information Extraction project aims to use both image and textual data to analyse and validate sources of online news and social media posts. Cross validation of locational information extracted from textual data and associated images may help to identify inconsistencies between reported accounts, potentially highlighting sources of online misinformation. In this regard, reliable and accurate geolocation of text and image data will play a crucial role in combatting misinformation, and in aiding counter terrorism efforts. 

Specific work for the placement 

The student will aid in the development of an image-based geocoder. This model uses points of interest and spatial features to associate an image with a pair of geographic coordinates. The model uses features such as text and signage, foliage types, and building styles to geolocate images to within a broad region. A more fine-scale geographic prediction can then be made by using spatial features which are identifiable and projectable onto a map, such as building orientations and road layouts. 

The student will help with the development and testing of this model. This will involve working closely with the project supervisor and other staff working on the project to help develop the code-base, curate and annotate appropriate training and testing data, and test the model on the curated data.  

Potential deliverables 

  • A developed and curated dataset of image and textual data with appropriate geotagging.  
  • Contribution to a Python software package associated with the project. 
  • Contribution to an academic publication in the field of computer vision and geographic information extraction.  
  • Presentation of the work at an internal symposium.  
  • Presentation of the work to the wider academic community, e.g. the Alan Turing Institute.