🤖 AI Summary
To address the challenges of modeling cross-variable dependencies and adapting to multi-frequency patterns in multivariate time series forecasting, this paper proposes the Frequency-Enhanced Transformer (FET). Methodologically, FET maps time series into the complex frequency domain via the Discrete Fourier Transform (DFT) and models inter-variable dependencies independently within the spectral space. It introduces a frequency-enhanced attention mechanism that incorporates learnable complex-valued weight matrices to jointly refine the original attention scores, coupled with L1 row normalization to stabilize gradient flow and enhance feature diversity. Experimentally, FET achieves state-of-the-art performance across 18 real-world benchmarks spanning power systems, traffic, meteorology, healthcare, and finance. Moreover, its core modules are plug-and-play, consistently boosting the accuracy of mainstream Transformer-based forecasters. This work establishes a novel paradigm for time series modeling grounded in the frequency domain.
📝 Abstract
This paper presents extbf{FreEformer}, a simple yet effective model that leverages a extbf{Fre}quency extbf{E}nhanced Trans extbf{former} for multivariate time series forecasting. Our work is based on the assumption that the frequency spectrum provides a global perspective on the composition of series across various frequencies and is highly suitable for robust representation learning. Specifically, we first convert time series into the complex frequency domain using the Discrete Fourier Transform (DFT). The Transformer architecture is then applied to the frequency spectra to capture cross-variate dependencies, with the real and imaginary parts processed independently. However, we observe that the vanilla attention matrix exhibits a low-rank characteristic, thus limiting representation diversity. This could be attributed to the inherent sparsity of the frequency domain and the strong-value-focused nature of Softmax in vanilla attention. To address this, we enhance the vanilla attention mechanism by introducing an additional learnable matrix to the original attention matrix, followed by row-wise L1 normalization. Theoretical analysis~demonstrates that this enhanced attention mechanism improves both feature diversity and gradient flow. Extensive experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks covering electricity, traffic, weather, healthcare and finance. Notably, the enhanced attention mechanism also consistently improves the performance of state-of-the-art Transformer-based forecasters.