Efficient Linear Attention for Multivariate Time Series Modeling via Entropy Equality

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Standard attention mechanisms suffer from O(L²) computational complexity due to the Softmax operation, hindering scalability for multivariate time series modeling. To address this, we propose the first linear-time attention mechanism grounded in entropy equality—motivated by the insight that attention efficacy stems from the moderation and balance of weight distributions, not Softmax nonlinearity. We establish a structural similarity principle linking probabilistic rank alignment with entropy similarity, and leverage the strict concavity of entropy over the probability simplex to design a linear-complexity (O(L)) entropy approximation and attention weight estimation algorithm. Evaluated on four spatiotemporal forecasting benchmarks, our method achieves substantial reductions in memory and computation while matching or exceeding the predictive accuracy of state-of-the-art linear- and quadratic-complexity baselines—demonstrating both the effectiveness and scalability of entropy-driven attention modeling.

Technology Category

Application Category

📝 Abstract

Attention mechanisms have been extensively employed in various applications, including time series modeling, owing to their capacity to capture intricate dependencies; however, their utility is often constrained by quadratic computational complexity, which impedes scalability for long sequences. In this work, we propose a novel linear attention mechanism designed to overcome these limitations. Our approach is grounded in a theoretical demonstration that entropy, as a strictly concave function on the probability simplex, implies that distributions with aligned probability rankings and similar entropy values exhibit structural resemblance. Building on this insight, we develop an efficient approximation algorithm that computes the entropy of dot-product-derived distributions with only linear complexity, enabling the implementation of a linear attention mechanism based on entropy equality. Through rigorous analysis, we reveal that the effectiveness of attention in spatio-temporal time series modeling may not primarily stem from the non-linearity of softmax but rather from the attainment of a moderate and well-balanced weight distribution. Extensive experiments on four spatio-temporal datasets validate our method, demonstrating competitive or superior forecasting performance while achieving substantial reductions in both memory usage and computational time.

Problem

Research questions and friction points this paper is trying to address.

Overcome quadratic complexity limitations in attention mechanisms for time series

Develop linear attention using entropy equality for efficient multivariate modeling

Achieve competitive forecasting performance with reduced computational resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear attention mechanism reduces quadratic complexity

Entropy equality enables efficient distribution approximation

Moderate balanced weight distribution replaces softmax nonlinearity

🔎 Similar Papers

WAVE: Weighted Autoregressive Varing Gate for Time Series Forecasting