🤖 AI Summary
Existing approaches primarily focus on prediction while lacking explicit joint reasoning over spatiotemporal dynamics, spatial dependencies, and textual context, limiting their applicability in high-stakes domains such as transportation and power grids. To address this gap, this work proposes STReasoner, a novel framework that introduces ST-Bench—the first multitask benchmark tailored for spatiotemporal reasoning—and develops a spatial-aware Group Relative Policy Optimization (S-GRPO) algorithm. S-GRPO uniquely incorporates spatial information gain as a reward signal in reinforcement learning to guide large language models in explicitly fusing time series, graph structures, and textual inputs for coherent reasoning. Experiments demonstrate that the proposed method achieves average accuracy improvements of 17%–135% across multiple tasks, operates at only 0.004× the cost of commercial models, and exhibits strong generalization capabilities on real-world data.
📝 Abstract
Spatio-temporal reasoning in time series involves the explicit synthesis of temporal dynamics, spatial dependencies, and textual context. This capability is vital for high-stakes decision-making in systems such as traffic networks, power grids, and disease propagation. However, the field remains underdeveloped because most existing works prioritize predictive accuracy over reasoning. To address the gap, we introduce ST-Bench, a benchmark consisting of four core tasks, including etiological reasoning, entity identification, correlation reasoning, and in-context forecasting, developed via a network SDE-based multi-agent data synthesis pipeline. We then propose STReasoner, which empowers LLM to integrate time series, graph structure, and text for explicit reasoning. To promote spatially grounded logic, we introduce S-GRPO, a reinforcement learning algorithm that rewards performance gains specifically attributable to spatial information. Experiments show that STReasoner achieves average accuracy gains between 17% and 135% at only 0.004X the cost of proprietary models and generalizes robustly to real-world data.