🤖 AI Summary
This work addresses the limitations of traditional Mixture-of-Experts (MoE) models in time series forecasting—namely, insufficient expert specialization and low training efficiency, often necessitating full model retraining. To overcome these challenges, the authors propose an adaptive MoE framework that explicitly incorporates expert-specific losses into the overall optimization objective for the first time, coupled with a partial online learning mechanism to enable efficient incremental parameter updates. The resulting approach substantially enhances both expert specialization and training efficiency. Empirical evaluations across diverse real-world datasets in economics, tourism, and energy demonstrate that the proposed method consistently outperforms classical statistical baselines as well as state-of-the-art neural architectures such as Transformers and WaveNet in terms of both predictive accuracy and computational efficiency.
📝 Abstract
We propose a novel adaptive Mixture-of-Experts (MoE) framework for time series forecasting that enhances expert specialization by incorporating expert-specific loss information directly into the training process. Notably, the overall objective comprises the base forecasting loss and expert-specific losses, allowing expert-level prediction errors to jointly shape training alongside the global forecasting loss. This framework is further combined with a partial online learning strategy, enabling incremental updates of both the gating mechanism and expert parameters. This approach significantly reduces computational cost by eliminating the need for repeated full model retraining. By integrating expert-level loss awareness with efficient online optimization, the proposed method achieves improved learning efficiency while maintaining strong predictive performance. Empirical results across economic, tourism, and energy datasets with varying frequencies demonstrate that the proposed approach generally outperforms both statistical methods and state-of-the-art neural network models, such as Transformers and WaveNet, in forecasting accuracy and computational efficiency. Furthermore, ablation studies confirm the effectiveness of the expert-specific loss integration strategy, highlighting its contribution to enhancing predictive performance.