🤖 AI Summary
Addressing challenges in semantic retrieval of dynamic multimodal time-series data (e.g., healthcare, meteorology)—including weak cross-modal alignment, poor interpretability, and insufficient multi-channel modeling—this paper introduces the first framework enabling bidirectional fine-grained retrieval between text and time-series (Text-to-TimeSeries and TimeSeries-to-Text). Methodologically, it leverages multimodal contrastive learning with dynamic sampling. Key contributions are: (1) a channel-level time-series–text embedding alignment mechanism that precisely maps signal channels to textual semantics; (2) a context-aware dynamic hard negative mining strategy to enhance discriminative capability in contrastive learning; and (3) a lightweight, tunable dual-role encoder balancing retrieval efficiency and compatibility with retrieval-augmented generation (RAG). Evaluated across multiple domains, the method achieves state-of-the-art performance on downstream forecasting and classification tasks, significantly improving accuracy, interpretability, and cross-modal semantic retrieval capability.
📝 Abstract
The ubiquity of dynamic data in domains such as weather, healthcare, and energy underscores a growing need for effective interpretation and retrieval of time-series data. These data are inherently tied to domain-specific contexts, such as clinical notes or weather narratives, making cross-modal retrieval essential not only for downstream tasks but also for developing robust time-series foundation models by retrieval-augmented generation (RAG). Despite the increasing demand, time-series retrieval remains largely underexplored. Existing methods often lack semantic grounding, struggle to align heterogeneous modalities, and have limited capacity for handling multi-channel signals. To address this gap, we propose TRACE, a generic multimodal retriever that grounds time-series embeddings in aligned textual context. TRACE enables fine-grained channel-level alignment and employs hard negative mining to facilitate semantically meaningful retrieval. It supports flexible cross-modal retrieval modes, including Text-to-Timeseries and Timeseries-to-Text, effectively linking linguistic descriptions with complex temporal patterns. By retrieving semantically relevant pairs, TRACE enriches downstream models with informative context, leading to improved predictive accuracy and interpretability. Beyond a static retrieval engine, TRACE also serves as a powerful standalone encoder, with lightweight task-specific tuning that refines context-aware representations while maintaining strong cross-modal alignment. These representations achieve state-of-the-art performance on downstream forecasting and classification tasks. Extensive experiments across multiple domains highlight its dual utility, as both an effective encoder for downstream applications and a general-purpose retriever to enhance time-series models.