🤖 AI Summary
Traditional large language models are constrained by a static generation paradigm in time series forecasting, limiting their ability to effectively capture temporal dynamics and support iterative refinement or ensemble prediction. To address this, this work proposes CastFlow—a dynamic agent-based forecasting framework that decomposes the prediction process into four phases: planning, acting, predicting, and reflecting. By integrating a memory mechanism and a multi-perspective toolkit, CastFlow enables iterative optimization. The framework employs role specialization, decoupling general reasoning from numerical prediction: a frozen large language model handles high-level reasoning, while a domain-finetuned model performs evidence-guided forecasting based on ensemble baselines. Combined with a two-stage training strategy—supervised fine-tuning followed by reinforcement learning with verifiable rewards—CastFlow significantly outperforms existing methods across multiple benchmarks, demonstrating superior accuracy and generalization capability.
📝 Abstract
Recently, large language models (LLMs) have shown great promise in time series forecasting. However, most existing LLM-based forecasting methods still follow a static generative paradigm that directly maps historical observations to future values in a single pass. Under this paradigm, forecasting is constrained by limited temporal pattern extraction, single-round acquisition of contextual features, one-shot forecast generation, and lack of support from ensemble forecasts. To address these limitations, in this work, we propose CastFlow, a dynamic agentic forecasting framework that enables multi-view temporal pattern extraction, multi-round contextual features acquisition, iterative forecast refinement, and forecasting with ensemble forecasts. First, CastFlow organizes the forecasting process into planning, action, forecasting, and reflection, establishing an agentic workflow. Second, this workflow is supported by a memory module that retrieves prior experience and a multi-view toolkit that constructs diagnostic evidence and provides a reliable ensemble forecast baseline. Third, CastFlow adopts a role-specialized design that combines general-purpose reasoning with specialized numerical forecasting. Under this design, a frozen LLM preserves general-purpose reasoning, while a fine-tuned domain-specific LLM performs evidence-guided numerical forecasting based on the ensemble forecast baseline, rather than from scratch. To optimize a fine-tuned domain-specific LLM, we further develop a two-stage workflow-oriented training that combines supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). To evaluate the effectiveness of CastFlow, we conduct extensive experiments on diverse datasets and show that it achieves superior overall results against strong baselines. We hope that this work can serve as a step toward more adaptive and accurate time series forecasting.