🤖 AI Summary
Addressing two key challenges in long-term spatiotemporal forecasting—difficulty in extracting multiscale temporal features and complexity in modeling cross-node dependencies—this paper proposes STM2 and STM3, two novel models. Methodologically, we introduce a hierarchical spatiotemporal modeling framework that innovatively integrates a multiscale Mamba architecture with adaptive graph-aware causal convolution. To enhance representation learning, we incorporate a Mixture-of-Experts (MoE) structure, augmented by a stable routing mechanism and causal contrastive learning, thereby improving scale discriminability, pattern disentanglement, and dynamic modeling smoothness. Extensive experiments on multiple real-world benchmark datasets demonstrate state-of-the-art performance: our models achieve an average 12.7% reduction in long-term forecasting error compared to prior methods, validating their effectiveness in capturing complex, long-range spatiotemporal dependencies.
📝 Abstract
Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex long-term spatio-temporal dependencies efficiently. The long-term spatio-temporal dependency learning brings two new challenges: 1) The long-term temporal sequence includes multiscale information naturally which is hard to extract efficiently; 2) The multiscale temporal information from different nodes is highly correlated and hard to model. To address these challenges, we propose an efficient extit{ extbf{S}patio- extbf{T}emporal extbf{M}ultiscale extbf{M}amba} (STM2) that includes a multiscale Mamba architecture to capture the multiscale information efficiently and simultaneously, and an adaptive graph causal convolution network to learn the complex multiscale spatio-temporal dependency. STM2 includes hierarchical information aggregation for different-scale information that guarantees their distinguishability. To capture diverse temporal dynamics across all spatial nodes more efficiently, we further propose an enhanced version termed extit{ extbf{S}patio- extbf{T}emporal extbf{M}ixture of extbf{M}ultiscale extbf{M}amba} (STM3) that employs a special Mixture-of-Experts architecture, including a more stable routing strategy and a causal contrastive learning strategy to enhance the scale distinguishability. We prove that STM3 has much better routing smoothness and guarantees the pattern disentanglement for each expert successfully. Extensive experiments on real-world benchmarks demonstrate STM2/STM3's superior performance, achieving state-of-the-art results in long-term spatio-temporal time-series prediction.