🤖 AI Summary
Most existing time-series forecasting methods rely on “fast thinking”—i.e., end-to-end black-box prediction—lacking explicit, interpretable intermediate reasoning steps and thus hindering domain-specific fine-grained modeling. To address this, we propose a “slow thinking” paradigm that reformulates forecasting as a multi-step temporal reasoning task and introduce the first large language model (LLM) endowed with explicit, stepwise reasoning capabilities. Our contributions are threefold: (1) Time-R1, a novel two-stage reinforcement fine-tuning framework; (2) a fine-grained multi-objective reward function jointly optimizing prediction accuracy, reasoning consistency, and temporal plausibility; and (3) GRIP, a non-uniform path sampling strategy, coupled with chain-of-thought distillation to enhance reasoning-path exploration efficiency. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods, validating the superiority of our slow-thinking paradigm in accuracy, interpretability, and generalization.
📝 Abstract
To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.