CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the limitation of conventional Transformer softmax attention in effectively disentangling multi-scale temporal structures—such as global trends, local shocks, and seasonality—in time series data. To this end, the authors propose CAPS, a structured attention mechanism that explicitly models distinct temporal patterns within a single layer by integrating three additive pathways: Riemann softmax, prefix-product gating, and a Clock baseline, while leveraging SO(2) rotations for phase alignment. A novel shared Clock mechanism dynamically modulates information across these pathways using time-aware importance weights, uniquely unifying attention, recurrent structures, and temporal alignment within a single framework. The method achieves superior performance over standard softmax and linear attention on both short- and long-horizon forecasting benchmarks, matching or exceeding seven strong baselines while maintaining linear computational complexity.

Technology Category

Application Category

📝 Abstract

This paper presents $\textbf{CAPS}$ (Clock-weighted Aggregation with Prefix-products and Softmax), a structured attention mechanism for time series forecasting that decouples three distinct temporal structures: global trends, local shocks, and seasonal patterns. Standard softmax attention entangles these through global normalization, while recent recurrent models sacrifice long-term, order-independent selection for order-dependent causal structure. CAPS combines SO(2) rotations for phase alignment with three additive gating paths -- Riemann softmax, prefix-product gates, and a Clock baseline -- within a single attention layer. We introduce the Clock mechanism, a learned temporal weighting that modulates these paths through a shared notion of temporal importance. Experiments on long- and short-term forecasting benchmarks surpass vanilla softmax and linear attention mechanisms and demonstrate competitive performance against seven strong baselines with linear complexity. Our code implementation is available at https://github.com/vireshpati/CAPS-Attention.

Problem

Research questions and friction points this paper is trying to address.

time series forecasting

attention mechanism

temporal structures

transformer

long-term dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured attention

time series forecasting

Clock mechanism