STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing two key challenges in long-term spatiotemporal forecasting—difficulty in extracting multiscale temporal features and complexity in modeling cross-node dependencies—this paper proposes STM2 and STM3, two novel models. Methodologically, we introduce a hierarchical spatiotemporal modeling framework that innovatively integrates a multiscale Mamba architecture with adaptive graph-aware causal convolution. To enhance representation learning, we incorporate a Mixture-of-Experts (MoE) structure, augmented by a stable routing mechanism and causal contrastive learning, thereby improving scale discriminability, pattern disentanglement, and dynamic modeling smoothness. Extensive experiments on multiple real-world benchmark datasets demonstrate state-of-the-art performance: our models achieve an average 12.7% reduction in long-term forecasting error compared to prior methods, validating their effectiveness in capturing complex, long-range spatiotemporal dependencies.

Technology Category

Application Category

📝 Abstract
Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex long-term spatio-temporal dependencies efficiently. The long-term spatio-temporal dependency learning brings two new challenges: 1) The long-term temporal sequence includes multiscale information naturally which is hard to extract efficiently; 2) The multiscale temporal information from different nodes is highly correlated and hard to model. To address these challenges, we propose an efficient extit{ extbf{S}patio- extbf{T}emporal extbf{M}ultiscale extbf{M}amba} (STM2) that includes a multiscale Mamba architecture to capture the multiscale information efficiently and simultaneously, and an adaptive graph causal convolution network to learn the complex multiscale spatio-temporal dependency. STM2 includes hierarchical information aggregation for different-scale information that guarantees their distinguishability. To capture diverse temporal dynamics across all spatial nodes more efficiently, we further propose an enhanced version termed extit{ extbf{S}patio- extbf{T}emporal extbf{M}ixture of extbf{M}ultiscale extbf{M}amba} (STM3) that employs a special Mixture-of-Experts architecture, including a more stable routing strategy and a causal contrastive learning strategy to enhance the scale distinguishability. We prove that STM3 has much better routing smoothness and guarantees the pattern disentanglement for each expert successfully. Extensive experiments on real-world benchmarks demonstrate STM2/STM3's superior performance, achieving state-of-the-art results in long-term spatio-temporal time-series prediction.
Problem

Research questions and friction points this paper is trying to address.

Efficiently learning complex long-term spatio-temporal dependencies
Extracting multiscale information from long-term temporal sequences
Modeling highly correlated multiscale temporal information across nodes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiscale Mamba architecture for efficient information capture
Adaptive graph causal convolution for spatio-temporal dependency
Mixture-of-Experts with stable routing and contrastive learning
🔎 Similar Papers
No similar papers found.
Haolong Chen
Haolong Chen
The Chinese University of Hong Kong, Shenzhen
Artificial IntelligenceComputer Science
L
Liang Zhang
School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong, China
Z
Zhengyuan Xin
Shenzhen Research Institute of Big Data, Shenzhen, Guangdong, China; The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China
G
Guangxu Zhu
Shenzhen Research Institute of Big Data, Shenzhen, Guangdong, China; The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China