Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This work addresses the challenge of distributional drift in non-stationary time series, which renders static models ineffective. To this end, we propose a dynamic mixture-of-experts framework that detects distribution shifts using Maximum Mean Discrepancy (MMD) and dynamically adds or removes heterogeneous experts accordingly. A novel temporal memory routing mechanism, integrating recurrent states with an anomaly bank, enables context-aware expert selection without requiring model updates at test time, thereby facilitating adaptive forecasting. Evaluated on nine benchmark datasets, our method achieves state-of-the-art performance, reducing average MSE and MAE by 10.4% and 7.8%, respectively, and significantly enhancing prediction robustness and accuracy in non-stationary environments.
📝 Abstract
Non-stationary time series forecasting is challenged by evolving distribution shifts that static models struggle to capture. While Mixture-of-Experts (MoE) architectures offer a promising paradigm for decoupling complex drift patterns, existing approaches are limited by fixed expert pools and memoryless routing, hampering their ability to adapt to abrupt regime shifts. To address this, we propose Dynamic TMoE, a framework that unifies architectural evolution with temporal continuity during learning phase. By detecting distribution shifts via Maximum Mean Discrepancy (MMD), we dynamically instantiate heterogeneous experts and prune redundant ones to optimize capacity. Additionally, a temporal memory router leverages recurrent states and an anomaly repository to ensure stable, context-aware expert selection without requiring test-time updates. Experiments on nine benchmarks demonstrate state-of-the-art performance, reducing MSE by 10.4% and MAE by 7.8%. Code is available at https://github.com/andone-07/Dynamic-TMoE.
Problem

Research questions and friction points this paper is trying to address.

non-stationary time series forecasting
distribution shift
Mixture-of-Experts
regime shifts
temporal continuity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Mixture of Experts
Distribution Shift Detection
Temporal Memory Router
Non-Stationary Time Series Forecasting
Maximum Mean Discrepancy
🔎 Similar Papers
No similar papers found.