Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting

πŸ“… 2025-09-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing large pretrained time-series models (e.g., Chronos, Time-MoE) achieve strong zero-shot forecasting performance but incur prohibitive computational overhead. To address this, we propose LightMoEβ€”a lightweight, frequency-domain-aware Mixture of Experts (MoE) model. Our method replaces deep-network experts with frequency-specialized linear experts and introduces a spectral gating mechanism for dynamic, interpretable expert selection; additionally, we adopt a multi-frequency resampling pretraining strategy to enhance cross-sampling-rate robustness. LightMoE matches state-of-the-art zero-shot forecasting accuracy while drastically reducing model size and inference latency: it cuts FLOPs by up to 62%. Crucially, the learned gating weights inherently reflect frequency-domain importance, improving model interpretability. The code and pretrained models are publicly available.

Technology Category

Application Category

πŸ“ Abstract
Time series forecasting (TSF) is critical in domains like energy, finance, healthcare, and logistics, requiring models that generalize across diverse datasets. Large pre-trained models such as Chronos and Time-MoE show strong zero-shot (ZS) performance but suffer from high computational costs. In this work, We introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. Despite its simplicity, Super-Linear matches state-of-the-art performance while offering superior efficiency, robustness to various sampling rates, and enhanced interpretability. The implementation of Super-Linear is available at href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}
Problem

Research questions and friction points this paper is trying to address.

Addresses high computational costs in time series forecasting models
Improves efficiency and scalability for zero-shot forecasting performance
Enhances robustness to varying sampling rates and interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight mixture-of-experts model
Frequency-specialized linear experts
Spectral gating mechanism selection
πŸ”Ž Similar Papers
No similar papers found.
Liran Nochumsohn
Liran Nochumsohn
PhD student at Ben-Gurion University
Time series analysisDeep learning
R
Raz Marshanski
Faculty of Computer and Information Science, Ben-Gurion University
H
Hedi Zisling
Faculty of Computer and Information Science, Ben-Gurion University
Omri Azencot
Omri Azencot
Senior Lecturer (Assistant Professor) of Computer Science, BGU
Machine LearningRepresentation LearningGenerative ModelingSequential Modeling