Ada-MoGE: Adaptive Mixture of Gaussian Expert Model for Time Series Forecasting

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Traditional Mixture-of-Experts (MoE) models for multivariate time series forecasting suffer from fixed expert counts, rendering them ill-suited to dynamic spectral distribution shifts—leading to imbalanced frequency coverage: insufficient experts miss critical frequency bands, while excessive ones introduce noise. To address this, we propose the Adaptive Gaussian Mixture-of-Experts (AG-MoE), the first MoE framework featuring a spectrum-intensity-aware mechanism for automatic expert count adaptation. Instead of hard frequency-domain truncation, AG-MoE employs Gaussian bandpass filtering for smooth, robust frequency-component disentanglement. By performing fine-grained spectral decomposition and modeling within the frequency domain—and integrating a lightweight architecture—it achieves state-of-the-art performance across six benchmark datasets with only 0.2M parameters. This demonstrates superior efficiency, strong generalization, and significant improvements over existing methods.

Technology Category

Application Category

📝 Abstract

Multivariate time series forecasts are widely used, such as industrial, transportation and financial forecasts. However, the dominant frequencies in time series may shift with the evolving spectral distribution of the data. Traditional Mixture of Experts (MoE) models, which employ a fixed number of experts, struggle to adapt to these changes, resulting in frequency coverage imbalance issue. Specifically, too few experts can lead to the overlooking of critical information, while too many can introduce noise. To this end, we propose Ada-MoGE, an adaptive Gaussian Mixture of Experts model. Ada-MoGE integrates spectral intensity and frequency response to adaptively determine the number of experts, ensuring alignment with the input data's frequency distribution. This approach prevents both information loss due to an insufficient number of experts and noise contamination from an excess of experts. Additionally, to prevent noise introduction from direct band truncation, we employ Gaussian band-pass filtering to smoothly decompose the frequency domain features, further optimizing the feature representation. The experimental results show that our model achieves state-of-the-art performance on six public benchmarks with only 0.2 million parameters.

Problem

Research questions and friction points this paper is trying to address.

Adapts expert count to shifting time series frequencies

Prevents information loss and noise from improper expert numbers

Uses Gaussian filtering for smooth frequency feature decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptively determines expert count using spectral intensity

Employs Gaussian band-pass filtering for smooth decomposition

Achieves state-of-the-art performance with minimal parameters

🔎 Similar Papers

No similar papers found.