Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing large pretrained time-series models (e.g., Chronos, Time-MoE) achieve strong zero-shot forecasting performance but incur prohibitive computational overhead. To address this, we propose LightMoE—a lightweight, frequency-domain-aware Mixture of Experts (MoE) model. Our method replaces deep-network experts with frequency-specialized linear experts and introduces a spectral gating mechanism for dynamic, interpretable expert selection; additionally, we adopt a multi-frequency resampling pretraining strategy to enhance cross-sampling-rate robustness. LightMoE matches state-of-the-art zero-shot forecasting accuracy while drastically reducing model size and inference latency: it cuts FLOPs by up to 62%. Crucially, the learned gating weights inherently reflect frequency-domain importance, improving model interpretability. The code and pretrained models are publicly available.

Technology Category

Application Category

📝 Abstract

Time series forecasting (TSF) is critical in domains like energy, finance, healthcare, and logistics, requiring models that generalize across diverse datasets. Large pre-trained models such as Chronos and Time-MoE show strong zero-shot (ZS) performance but suffer from high computational costs. In this work, We introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. Despite its simplicity, Super-Linear matches state-of-the-art performance while offering superior efficiency, robustness to various sampling rates, and enhanced interpretability. The implementation of Super-Linear is available at href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}

Problem

Research questions and friction points this paper is trying to address.

Addresses high computational costs in time series forecasting models

Improves efficiency and scalability for zero-shot forecasting performance

Enhances robustness to varying sampling rates and interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight mixture-of-experts model

Frequency-specialized linear experts

Spectral gating mechanism selection

🔎 Similar Papers

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach