๐ค AI Summary
To address performance bottlenecks and overfitting caused by fixed short input windows in long-term time series forecasting, this paper proposes a novel decoupled multi-scale periodic modeling paradigm. The method introduces (1) a Multi-Period Sequence Decomposition (MPSD) moduleโfirst of its kindโto explicitly disentangle components of distinct periodicities; and (2) a Multi-Token Pattern Recognition (MTPR) network that adaptively assigns optimal token granularity per period, enabling input lengths up to 10ร longer than baseline. Integrating multi-scale decomposition, adaptive tokenization, period-aware attention, and a lightweight architecture, the approach incurs only 0.22ร the computational cost of the baseline. On standard long-horizon forecasting benchmarks, it achieves an average accuracy improvement of 27%, with peak gains reaching 38%. The framework significantly mitigates overfitting while ensuring high efficiency and strong interpretability.
๐ Abstract
Short fixed-length inputs are the main bottleneck of deep learning methods in long time-series forecasting tasks. Prolonging input length causes overfitting, rapidly deteriorating accuracy. Our research indicates that the overfitting is a combination reaction of the multi-scale pattern coupling in time series and the fixed focusing scale of current models. First, we find that the patterns exhibited by a time series across various scales are reflective of its multi-periodic nature, where each scale corresponds to specific period length. Second, We find that the token size predominantly dictates model behavior, as it determines the scale at which the model focuses and the context size it can accommodate. Our idea is to decouple the multi-scale temporal patterns of time series and to model each pattern with its corresponding period length as token size. We introduced a novel series-decomposition module(MPSD), and a Multi-Token Pattern Recognition neural network(MTPR), enabling the model to handle extit{inputs up to $10 imes$ longer}. Sufficient context enhances performance( extit{38% maximum precision improvement}), and the decoupling approach offers extit{Low complexity($0.22 imes$ cost)} and extit{high interpretability}.