Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of inherent stochasticity and non-differentiable evaluation metrics in physical spatiotemporal forecasting, this paper proposes a novel model-based reinforcement learning paradigm that reformulates prediction as sequential planning. Methodologically, we construct a generative world model to simulate high-fidelity, diverse future states and employ domain-specific non-differentiable metrics—such as extreme-event hit rate—as sparse reward signals. We design a beam-search–guided, reward-driven imagination mechanism and introduce an iterative pseudo-labeling self-training strategy. Crucially, our framework enables end-to-end optimization of non-differentiable objectives without gradient approximation. Experiments demonstrate substantial reductions in overall prediction error alongside marked improvements in long-tail event detection. This work establishes a new pathway toward interpretable and robust forecasting for complex physical systems.

Technology Category

Application Category

📝 Abstract
To address the dual challenges of inherent stochasticity and non-differentiable metrics in physical spatiotemporal forecasting, we propose Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning. SFP constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an "imagination-based" environmental simulation. Within this framework, a base forecasting model acts as an agent, guided by a beam search-based planning algorithm that leverages non-differentiable domain metrics as reward signals to explore high-return future sequences. These identified high-reward candidates then serve as pseudo-labels to continuously optimize the agent's policy through iterative self-training, significantly reducing prediction error and demonstrating exceptional performance on critical domain metrics like capturing extreme events.
Problem

Research questions and friction points this paper is trying to address.

Addresses spatiotemporal forecasting challenges with stochasticity and non-differentiable metrics
Proposes model-based reinforcement learning with generative world simulation
Optimizes forecasting through planning algorithms using non-differentiable reward signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative World Model simulates diverse future states
Beam search planning uses non-differentiable metrics as rewards
Self-training optimizes policy with high-reward pseudo-labels
🔎 Similar Papers
No similar papers found.
H
Hao Wu
Tsinghua University
Y
Yuan Gao
Tsinghua University
Xingjian Shi
Xingjian Shi
OpenAI
Deep LearningComputer VisionNatural Language ProcessingMultimodalSpeech
Shuaipeng Li
Shuaipeng Li
Tencent
F
Fan Xu
SLAI
F
Fan Zhang
CUHK
Z
Zhihong Zhu
Tencent Jarvis Lab
Weiyan Wang
Weiyan Wang
Tencent
Machine Learning SystemHigh Performance Computing
X
Xiao Luo
University of Wisconsin
K
Kun Wang
Nanyang Technological University
X
Xian Wu
Tencent Jarvis Lab
Xiaomeng Huang
Xiaomeng Huang
Tsinghua University
Earth System ModelHPCBig DataAI