Filter then Attend: Improving attention-based Time Series Forecasting with Spectral Filtering

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformers suffer from low-frequency bias and high computational overhead in long-term time series forecasting (LTSF). To address this, we propose the Spectrally-Aware Transformer (SAT), the first architecture to integrate a lightweight, learnable spectral filtering module *before* self-attention—explicitly modeling and enhancing full-spectrum information utilization with only ~1,000 additional parameters. This design enables reduced embedding dimensions, simultaneously shrinking model size and improving representational capacity. Synthetic experiments confirm the filter’s effectiveness in spectral disentanglement. On multiple LTSF benchmarks, SAT achieves 5–10% improvements in prediction accuracy over the vanilla Transformer baseline, while requiring significantly fewer parameters and lower computational cost.

Technology Category

Application Category

📝 Abstract
Transformer-based models are at the forefront in long time-series forecasting (LTSF). While in many cases, these models are able to achieve state of the art results, they suffer from a bias toward low-frequencies in the data and high computational and memory requirements. Recent work has established that learnable frequency filters can be an integral part of a deep forecasting model by enhancing the model's spectral utilization. These works choose to use a multilayer perceptron to process their filtered signals and thus do not solve the issues found with transformer-based models. In this paper, we establish that adding a filter to the beginning of transformer-based models enhances their performance in long time-series forecasting. We add learnable filters, which only add an additional $approx 1000$ parameters to several transformer-based models and observe in multiple instances 5-10 % relative improvement in forecasting performance. Additionally, we find that with filters added, we are able to decrease the embedding dimension of our models, resulting in transformer-based architectures that are both smaller and more effective than their non-filtering base models. We also conduct synthetic experiments to analyze how the filters enable Transformer-based models to better utilize the full spectrum for forecasting.
Problem

Research questions and friction points this paper is trying to address.

Improving transformer models' spectral utilization in time series forecasting
Reducing computational and memory requirements for transformer architectures
Enhancing forecasting accuracy with learnable spectral filters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral filtering before transformer attention
Learnable filters with minimal parameter increase
Reduced embedding dimension for smaller models
E
Elisha Dayag
Department of Mathematics, University of California, Irvine
N
Nhat Thanh Van Tran
Department of Mathematics, University of California, Irvine
Jack Xin
Jack Xin
Distinguished Professor of Mathematics, UC Irvine
Applied Computational MathMachine Learningand Applications