🤖 AI Summary
Existing unimodal time series models rely solely on numerical data, suffering from semantic sparsity; while multimodal approaches incorporate textual information, they typically leverage only unidirectional text—either historical or future—and lack fine-grained modeling of text–time semantics, temporal dynamics, and causal relationships. To address these limitations, we propose a bidirectional text-driven forecasting paradigm that jointly integrates descriptive historical text and predictive future text for the first time. We design a three-stage cross-modal alignment module—encompassing semantic, temporal, and causal alignment—leveraging a large language model for text encoding and a dedicated time series feature extractor. Extensive experiments across 15 multivariate time series benchmarks demonstrate that our method consistently matches or surpasses state-of-the-art approaches, validating the substantial performance gains enabled by bidirectional textual integration.
📝 Abstract
Most existing single-modal time series models rely solely on numerical series, which suffer from the limitations imposed by insufficient information. Recent studies have revealed that multimodal models can address the core issue by integrating textual information. However, these models focus on either historical or future textual information, overlooking the unique contributions each plays in time series forecasting. Besides, these models fail to grasp the intricate relationships between textual and time series data, constrained by their moderate capacity for multimodal comprehension. To tackle these challenges, we propose Dual-Forecaster, a pioneering multimodal time series model that combines both descriptively historical textual information and predictive textual insights, leveraging advanced multimodal comprehension capability empowered by three well-designed cross-modality alignment techniques. Our comprehensive evaluations on fifteen multimodal time series datasets demonstrate that Dual-Forecaster is a distinctly effective multimodal time series model that outperforms or is comparable to other state-of-the-art models, highlighting the superiority of integrating textual information for time series forecasting. This work opens new avenues in the integration of textual information with numerical time series data for multimodal time series analysis.