Understanding user intent is essential for situational and context-aware decision-making. Motivated by a real-world scenario, this work addresses intent predictions of smart device users in the vicinity of vehicles by modeling sequential spatiotemporal data. However, in real-world scenarios, environmental factors and sensor limitations can result in non-stationary and irregularly sampled data, posing significant challenges. To address these issues, we propose STaRFormer, a Transformer-based approach that can serve as a universal framework for sequential modeling. STaRFormer utilizes a new dynamic attention-based regional masking scheme combined with a novel semi-supervised contrastive learning paradigm to enhance task-specific latent representations. Comprehensive experiments on 56 datasets varying in types (including non-stationary and irregularly sampled), domains, sequence lengths, training samples, and applications demonstrate the efficacy of STaRFormer. We achieve notable improvements over state-of-the-art approaches.

https://star-former.github.io/


This work aims to address the challenges posed by real-world time series data, which often exhibit non-stationarity and irregular sampling characteristics due to factors such as sensor technology, external conditions, and device malfunctions. Conventional machine learning algorithms, such as LSTM and Transformer, typically assume the data is fully observed, stationary, and sampled at regular intervals. We developed a versatile framework, STaRFormer, that can effectively model time series with these characteristics while maintaining applicability to regular time series data as well.

First published: 30 September 2025