AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing time series forecasting models struggle to accommodate substantial structural heterogeneity across sequences due to shared dense computational pathways, while standard Mixture-of-Experts (MoE) approaches suffer from insufficient expert specialization and unstable fine-tuning dynamics. To address these limitations, this work proposes AME-TS, a structure-guided sparse foundation model for time series. AME-TS employs a lightweight structural predictor to extract sequence-level seasonal and trend characteristics, which are used to construct soft structural priors that inform token-level conditional routing and sparse activation, thereby enabling structure-aligned expert specialization. Evaluated on the GIFT-Eval benchmark, AME-TS achieves state-of-the-art performance among small models and competitive results with large models using fewer activated parameters. Furthermore, it demonstrates enhanced routing interpretability and expert stability during fine-tuning on the M5 dataset.

📝 Abstract

Time series forecasting models are increasingly scaled through large Transformer backbones, yet most existing approaches process all series through a shared dense computation path despite substantial heterogeneity in temporal structure. Mixture-of-Experts (MoE) offers a natural alternative by enabling conditional computation, but standard MoE routing leaves expert specialization weakly identified and often unstable during downstream adaptation. We propose AME-TS, a structure-guided sparse time series foundation model that aligns expert routing with interpretable temporal structure. AME-TS first uses a lightweight regime predictor to estimate series-level descriptors, including forecastability, seasonality, trend, and sparsity, and maps them to a soft structural prior over experts. This series-level prior guides token-level routing during training, encouraging structure-aligned specialization. On the GIFT-Eval benchmark, AME-TS delivers a strong accuracy-efficiency tradeoff across model scales: it substantially outperforms existing time series foundation models at small model scales and remains competitive with the strongest models at larger scales, while activating substantially fewer parameters through sparse routing. We further show that AME-TS learns more interpretable routing geometry and substantially more stable expert specialization than standard MoE during fine-tuning on the M5 dataset. These results suggest that structure-aware routing is an effective and reliable way to realize the benefits of sparse expert models for time series forecasting.

Problem

Research questions and friction points this paper is trying to address.

time series forecasting

Mixture-of-Experts

expert specialization

temporal structure heterogeneity

routing instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

structure-aware routing

time series forecasting