Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard Transformers’ fully connected attention mechanism neglects the inherent causality and locality of time series, limiting predictive performance. To address this, we propose Weighted Causal Attention (WCA), a novel attention mechanism that introduces a learnable weight function based on smooth heavy-tailed decay—thereby encoding temporal locality as an end-to-end differentiable inductive bias. WCA integrates strict causal masking with principled power-law decay, yielding a Transformer variant that balances architectural flexibility with interpretability. Evaluated across multiple mainstream time-series forecasting benchmarks, our approach achieves state-of-the-art accuracy. Moreover, the learned attention weights exhibit clear, monotonic temporal decay patterns—empirically confirming that explicit temporal priors enhance both model performance and interpretability.

Technology Category

Application Category

📝 Abstract
Transformers have recently shown strong performance in time-series forecasting, but their all-to-all attention mechanism overlooks the (temporal) causal and often (temporally) local nature of data. We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. This simple yet effective modification endows the model with an inductive bias favoring temporally local dependencies, while still allowing sufficient flexibility to learn the unique correlation structure of each dataset. Our empirical results demonstrate that Powerformer not only achieves state-of-the-art accuracy on public time-series benchmarks, but also that it offers improved interpretability of attention patterns. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention. These findings highlight the importance of domain-specific modifications to the Transformer architecture for time-series forecasting, and they establish Powerformer as a strong, efficient, and principled baseline for future research and real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Addresses causal and local nature in time-series data
Introduces weighted causal attention for enhanced forecasting
Improves interpretability and accuracy in time-series models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal attention weights
Heavy-tailed decay
Locality bias enhancement
🔎 Similar Papers
Kareem Hegazy
Kareem Hegazy
Postdoc, UC Berkeley
Scientific Machine LearningPhysicsUltrafast Diffraction Imaging
M
Michael W. Mahoney
Department of Statistics, University of California Berkeley; International Computer Science Institute, Berkeley; Lawrence Berkeley National Laboratory
N. Benjamin Erichson
N. Benjamin Erichson
Research Scientist
Linear AlgebraDeep LearningDynamical Systems