π€ AI Summary
Existing large pretrained time-series models (e.g., Chronos, Time-MoE) achieve strong zero-shot forecasting performance but incur prohibitive computational overhead. To address this, we propose LightMoEβa lightweight, frequency-domain-aware Mixture of Experts (MoE) model. Our method replaces deep-network experts with frequency-specialized linear experts and introduces a spectral gating mechanism for dynamic, interpretable expert selection; additionally, we adopt a multi-frequency resampling pretraining strategy to enhance cross-sampling-rate robustness. LightMoE matches state-of-the-art zero-shot forecasting accuracy while drastically reducing model size and inference latency: it cuts FLOPs by up to 62%. Crucially, the learned gating weights inherently reflect frequency-domain importance, improving model interpretability. The code and pretrained models are publicly available.
π Abstract
Time series forecasting (TSF) is critical in domains like energy, finance, healthcare, and logistics, requiring models that generalize across diverse datasets. Large pre-trained models such as Chronos and Time-MoE show strong zero-shot (ZS) performance but suffer from high computational costs. In this work, We introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. Despite its simplicity, Super-Linear matches state-of-the-art performance while offering superior efficiency, robustness to various sampling rates, and enhanced interpretability. The implementation of Super-Linear is available at href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}