๐ค AI Summary
Existing time-series analysis methods that integrate large language models (LLMs) with textual label supervision often over-rely on textual cues while neglecting intrinsic temporal dynamics, leading to outputs inconsistent with temporal context. To address this, we propose TimeSenseโa novel multimodal framework that restores the original sequential structure via a time-aware reconstruction module and introduces coordinate-based positional embeddings to explicitly model spatial relationships among time points. This design enables deep coupling between linguistic reasoning and temporal dynamics while preserving LLMsโ language understanding capabilities and enhancing temporal feature learning. Evaluated on the EvalTS benchmark across ten diverse tasks, TimeSense achieves state-of-the-art performance, particularly excelling in complex, multivariate time-series reasoning where it significantly outperforms existing approaches.
๐ Abstract
In the time-series domain, an increasing number of works combine text with temporal data to leverage the reasoning capabilities of large language models (LLMs) for various downstream time-series understanding tasks. This enables a single model to flexibly perform tasks that previously required specialized models for each domain. However, these methods typically rely on text labels for supervision during training, biasing the model toward textual cues while potentially neglecting the full temporal features. Such a bias can lead to outputs that contradict the underlying time-series context. To address this issue, we construct the EvalTS benchmark, comprising 10 tasks across three difficulty levels, from fundamental temporal pattern recognition to complex real-world reasoning, to evaluate models under more challenging and realistic scenarios. We also propose TimeSense, a multimodal framework that makes LLMs proficient in time-series analysis by balancing textual reasoning with a preserved temporal sense. TimeSense incorporates a Temporal Sense module that reconstructs the input time-series within the model's context, ensuring that textual reasoning is grounded in the time-series dynamics. Moreover, to enhance spatial understanding of time-series data, we explicitly incorporate coordinate-based positional embeddings, which provide each time point with spatial context and enable the model to capture structural dependencies more effectively. Experimental results demonstrate that TimeSense achieves state-of-the-art performance across multiple tasks, and it particularly outperforms existing methods on complex multi-dimensional time-series reasoning tasks.