🤖 AI Summary
To address insufficient time-frequency information utilization and susceptibility of conventional attention mechanisms to historical noise in time-series forecasting, this paper proposes the Wavelet Differential Transformer (WaveDiff-Transformer). The model integrates wavelet transform for multi-resolution joint time-frequency modeling; introduces inverted-dimension attention to enhance inter-channel dependency learning; and proposes a differential attention mechanism—employing differenced softmax to suppress irrelevant historical responses while amplifying selection of critical time-frequency features. By synergistically combining wavelet analysis, multi-head attention, and differentiable attention computation, WaveDiff-Transformer achieves state-of-the-art performance across multiple multivariate time-series benchmark datasets, significantly improving both prediction accuracy and robustness. The source code is publicly available.
📝 Abstract
Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring for various systems. Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information. Moreover, when applied to time series forecasting tasks, traditional attention mechanisms tend to over-focus on irrelevant historical information, which may introduce noise into the prediction process, leading to biased results. We proposed WDformer, a wavelet-based differential Transformer model. This study employs the wavelet transform to conduct a multi-resolution analysis of time series data. By leveraging the advantages of joint representation in the time-frequency domain, it accurately extracts the key information components that reflect the essential characteristics of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing the attention mechanism to capture relationships between multiple variables. When performing attention calculations, we introduced the differential attention mechanism, which computes the attention score by taking the difference between two separate softmax attention matrices. This approach enables the model to focus more on important information and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple challenging real-world datasets, demonstrating its accuracy and effectiveness. Code is available at https://github.com/xiaowangbc/WDformer.