STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Addressing two key challenges in long-term spatiotemporal forecasting—difficulty in extracting multiscale temporal features and complexity in modeling cross-node dependencies—this paper proposes STM2 and STM3, two novel models. Methodologically, we introduce a hierarchical spatiotemporal modeling framework that innovatively integrates a multiscale Mamba architecture with adaptive graph-aware causal convolution. To enhance representation learning, we incorporate a Mixture-of-Experts (MoE) structure, augmented by a stable routing mechanism and causal contrastive learning, thereby improving scale discriminability, pattern disentanglement, and dynamic modeling smoothness. Extensive experiments on multiple real-world benchmark datasets demonstrate state-of-the-art performance: our models achieve an average 12.7% reduction in long-term forecasting error compared to prior methods, validating their effectiveness in capturing complex, long-range spatiotemporal dependencies.

Technology Category

Application Category

📝 Abstract

Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex long-term spatio-temporal dependencies efficiently. The long-term spatio-temporal dependency learning brings two new challenges: 1) The long-term temporal sequence includes multiscale information naturally which is hard to extract efficiently; 2) The multiscale temporal information from different nodes is highly correlated and hard to model. To address these challenges, we propose an efficient extit{ extbf{S}patio- extbf{T}emporal extbf{M}ultiscale extbf{M}amba} (STM2) that includes a multiscale Mamba architecture to capture the multiscale information efficiently and simultaneously, and an adaptive graph causal convolution network to learn the complex multiscale spatio-temporal dependency. STM2 includes hierarchical information aggregation for different-scale information that guarantees their distinguishability. To capture diverse temporal dynamics across all spatial nodes more efficiently, we further propose an enhanced version termed extit{ extbf{S}patio- extbf{T}emporal extbf{M}ixture of extbf{M}ultiscale extbf{M}amba} (STM3) that employs a special Mixture-of-Experts architecture, including a more stable routing strategy and a causal contrastive learning strategy to enhance the scale distinguishability. We prove that STM3 has much better routing smoothness and guarantees the pattern disentanglement for each expert successfully. Extensive experiments on real-world benchmarks demonstrate STM2/STM3's superior performance, achieving state-of-the-art results in long-term spatio-temporal time-series prediction.

Problem

Research questions and friction points this paper is trying to address.

Efficiently learning complex long-term spatio-temporal dependencies

Extracting multiscale information from long-term temporal sequences

Modeling highly correlated multiscale temporal information across nodes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiscale Mamba architecture for efficient information capture

Adaptive graph causal convolution for spatio-temporal dependency

Mixture-of-Experts with stable routing and contrastive learning

🔎 Similar Papers

No similar papers found.