๐ค AI Summary
To address the poor interpretability, strong interference from noisy channels, and high computational complexity of Transformer models in long-horizon multivariate time series forecasting, this paper proposes a lightweight Transformer architecture that integrates inverse seasonal-trend decomposition with a modified dot-product attention mechanism. The decomposition explicitly disentangles temporal components to suppress irrelevant feature interference, while the redesigned dot-attention enhances channel selectivity and enables feature-level importance attribution visualization. Evaluated on multiple benchmark datasets, the method achieves state-of-the-art (SOTA) long-term forecasting accuracy, reduces computational complexity by approximately 30%, and provides highly interpretable prediction rationales grounded in decomposed components and attention-based feature relevance.
๐ Abstract
In long-term time series forecasting, Transformer-based models have achieved great success, due to its ability to capture long-range dependencies. However, existing transformer-based methods face challenges in accurately identifying which variables play a pivotal role in the prediction process and tend to overemphasize noisy channels, thereby limiting the interpretability and practical effectiveness of the models. Besides, it faces scalability issues due to quadratic computational complexity of self-attention. In this paper, we propose a new model named Inverted Seasonal-Trend Decomposition Transformer (Ister), which addresses these challenges in long-term multivariate time series forecasting by designing an improved Transformer-based structure. Ister firstly decomposes original time series into seasonal and trend components. Then we propose a new Dot-attention mechanism to process the seasonal component, which improves both accuracy, computation complexity and interpretability. Upon completion of the training phase, it allows users to intuitively visualize the significance of each feature in the overall prediction. We conduct comprehensive experiments, and the results show that Ister achieves state-of-the-art (SOTA) performance on multiple datasets, surpassing existing models in long-term prediction tasks.