GateTS: Versatile and Efficient Forecasting via Attention-Inspired routed Mixture-of-Experts

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Traditional Mixture-of-Experts (MoE) models for univariate time series forecasting suffer from complex training procedures, reliance on auxiliary load-balancing losses, and manual router tuning. To address these issues, this paper proposes an attention-inspired gated routing MoE framework. Our approach eliminates the conventional Softmax-based router and explicit load-balancing objectives, instead achieving automatic expert utilization balancing through sparse expert selection and a Transformer-style architecture—thereby significantly simplifying training while preserving parameter and computational efficiency. Evaluated across diverse intermittent and long-/short-horizon datasets from energy, hydrology, and retail domains, our method outperforms state-of-the-art models (e.g., PatchTST) with fewer parameters, achieves higher prediction accuracy, and incurs lower inference overhead than LSTM—demonstrating superior accuracy–efficiency trade-offs.

Technology Category

Application Category

📝 Abstract

Accurate univariate forecasting remains a pressing need in real-world systems, such as energy markets, hydrology, retail demand, and IoT monitoring, where signals are often intermittent and horizons span both short- and long-term. While transformers and Mixture-of-Experts (MoE) architectures are increasingly favored for time-series forecasting, a key gap persists: MoE models typically require complicated training with both the main forecasting loss and auxiliary load-balancing losses, along with careful routing/temperature tuning, which hinders practical adoption. In this paper, we propose a model architecture that simplifies the training process for univariate time series forecasting and effectively addresses both long- and short-term horizons, including intermittent patterns. Our approach combines sparse MoE computation with a novel attention-inspired gating mechanism that replaces the traditional one-layer softmax router. Through extensive empirical evaluation, we demonstrate that our gating design naturally promotes balanced expert utilization and achieves superior predictive accuracy without requiring the auxiliary load-balancing losses typically used in classical MoE implementations. The model achieves better performance while utilizing only a fraction of the parameters required by state-of-the-art transformer models, such as PatchTST. Furthermore, experiments across diverse datasets confirm that our MoE architecture with the proposed gating mechanism is more computationally efficient than LSTM for both long- and short-term forecasting, enabling cost-effective inference. These results highlight the potential of our approach for practical time-series forecasting applications where both accuracy and computational efficiency are critical.

Problem

Research questions and friction points this paper is trying to address.

Simplifies MoE training for univariate time series forecasting

Handles intermittent patterns across short and long horizons

Eliminates auxiliary load-balancing losses in routing mechanism

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-inspired gating replaces softmax router

Sparse MoE computation without auxiliary losses

More efficient than LSTM and parameter-light

🔎 Similar Papers

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach