🤖 AI Summary
To address the challenge of effectively integrating textual event information into time-series forecasting, this paper proposes LLM-TSFM, a decoupled collaborative framework that leverages large language models (LLMs) to generate dynamic text instructions guiding time-series foundation models (TSFMs) for conditional forecasting, thereby enabling cross-modal semantic alignment. Key contributions include: (1) the first instruction-driven steering architecture, designed to prevent inter-modal information leakage; (2) the construction of the first leakage-robust multimodal time-series forecasting benchmark; and (3) a two-stage fully synthetic data training paradigm—comprising instruction generation and alignment fine-tuning. Experiments demonstrate that, trained exclusively on synthetic data, LLM-TSFM achieves a 25.7% improvement in forecasting accuracy over unimodal baselines and outperforms the current state-of-the-art multimodal methods by 22.5%.
📝 Abstract
Conventional forecasting methods rely on unimodal time series data, limiting their ability to exploit rich textual information. Recently, large language models (LLMs) and time series foundation models (TSFMs) have demonstrated powerful capability in textual reasoning and temporal modeling, respectively. Integrating the strengths of both to construct a multimodal model that concurrently leverages both temporal and textual information for future inference has emerged as a critical research challenge. To address the scarcity of event-series paired data, we propose a decoupled framework: an LLM is employed to transform textual events into revision instructions, which are then used to steer the output of TSFM. To implement this framework, we introduce ChronoSteer, a multimodal TSFM that can be steered through textual revision instructions, effectively bridging LLM and TSFM. Moreover, to mitigate the shortage of cross-modal instruction-series paired data, we devise a two-stage training strategy based on synthetic data. In addition, we also construct a high-quality multimodal time series forecasting benchmark to address the information leakage concerns during evaluation. After integrating with an LLM, ChronoSteer, which is trained exclusively on synthetic data, achieves a 25.7% improvement in prediction accuracy compared to the unimodal backbone and a 22.5% gain over the previous state-of-the-art multimodal method.