🤖 AI Summary
This work addresses the limitations of existing time series forecasting methods, which are predominantly unimodal and struggle to effectively incorporate textual context—such as hypothetical scenarios—for conditional prediction. Moreover, there is a lack of standardized benchmarks to evaluate whether models genuinely leverage textual information. To bridge this gap, the authors propose the first multimodal time series forecasting paradigm guided by hypothetical scenarios, introducing the What If TSF (WIT) benchmark dataset. WIT features expert-crafted real-world and counterfactual scenario descriptions meticulously aligned with corresponding time series, enabling rigorous assessment of a model’s ability to understand contextual cues and perform conditionally informed forecasting. This benchmark establishes a standardized evaluation platform for large language models in context-guided time series prediction, filling a critical void in both contextual understanding and conditional forecasting evaluation.
📝 Abstract
Time series forecasting is critical to real-world decision making, yet most existing approaches remain unimodal and rely on extrapolating historical patterns. While recent progress in large language models (LLMs) highlights the potential for multimodal forecasting, existing benchmarks largely provide retrospective or misaligned raw context, making it unclear whether such models meaningfully leverage textual inputs. In practice, human experts incorporate what-if scenarios with historical evidence, often producing distinct forecasts from the same observations under different scenarios. Inspired by this, we introduce What If TSF (WIT), a multimodal forecasting benchmark designed to evaluate whether models can condition their forecasts on contextual text, especially future scenarios. By providing expert-crafted plausible or counterfactual scenarios, WIT offers a rigorous testbed for scenario-guided multimodal forecasting. The benchmark is available at https://github.com/jinkwan1115/WhatIfTSF.