Mode-Adaptive Multimodal Architecture for Assistive Social Intelligence
Supervisor: Hira Hameed
Partner: VMO2 Lab
School: Engineering
Description:
Every day, millions of people who are deaf, blind, or deafblind navigate a world built around sensory experiences they cannot fully access. Current assistive technologies typically address one sense at a time, a hearing aid helps with sound; a screen reader with text, but real human communication is multimodal, combining speech, facial expressions, gesture, and proximity. Critically, most existing tools remain non-real time and unable to capture the dynamic social context that underpins everyday interaction. This project builds a single, adaptive AI-powered system that fuses radio frequency (RF), audio, and video sensing to support richer, more natural social interaction for people with sensory impairments.
The core innovation is a deafblind RF haptic assistant: a small RF sensor, embedded in a hearing-aid-sized device, detects nearby people's movements and emotional expressions without cameras or wearables. AI translates these signals into vibrations or Braille output, giving the user real-time social awareness through touch, including who is nearby, what they are doing, and how they are feeling. This addresses a genuine unmet need: current technologies for deafblind individuals are slow, intrusive, or reliant on human interpreters (Kassem et al., 2022).
The system is mode-adaptive, reconfiguring automatically based on the user's needs. In blind mode, it prioritises audio feedback and uses RF sensing to clarify speech in crowded spaces. In deaf mode, it processes audio and visual inputs locally to generate speech-to-text and sign language interpretation, delivered via a screen or AR glasses with haptic feedback indicating tone and speaker identity.
This approach builds on strong prior results: RF sensing has been shown to track facial expressions and detect emotional states from micro-Doppler signatures with high accuracy, without cameras or wearables (Hameed et al., 2024; Tan et al., 2024). The same sensing modality has demonstrated the ability to read lip movements through face masks at 95% test accuracy (Hameed et al., 2022), showing genuine potential as a next-generation assistive component. These results confirm the system is technically grounded rather than speculative.
Ethical considerations are central to the design. RF sensing is inherently privacy-preserving compared to cameras, with privacy-aware settings limiting identity processing in sensitive environments. The University of Glasgow ethics approval has been granted, and participant consent procedures are in place. The project directly advances EPSRC's commitments to human-centred, responsible AI and inclusive digital futures.