๐ค AI Summary
This work addresses the computational and memory bottlenecks of conventional attention mechanisms, which exhibit O(Nยฒd) complexity and hinder scalability in large-scale recommendation systems when modeling long user behavior sequences. To overcome this limitation, the authors propose SVD-Attention, the first approach to integrate singular value decomposition (SVD) into the attention mechanism. By leveraging the inherent low-rank structure of user behavior sequences while preserving the softmax formulation, SVD-Attention reduces complexity to O(Ndr), achieving theoretically lossless compression. The method balances expressive power and efficiency, enabling effective modeling of sequences with tens of thousands of interactions and candidate sets in the thousands. Deployed in Kuaishouโs online recommendation system, it yielded a 0.68% increase in video views and significant improvements across multiple core business metrics.
๐ Abstract
Attention mechanism remains the defining operator in Transformers since it provides expressive global credit assignment, yet its $O(N^2 d)$ time and memory cost in sequence length $N$ makes long-context modeling expensive and often forces truncation or other heuristics. Linear attention reduces complexity to $O(N d^2)$ by reordering computation through kernel feature maps, but this reformulation drops the softmax mechanism and shifts the attention score distribution. In recommender systems, low-rank structure in matrices is not a rare case, but rather the default inductive bias in its representation learning, particularly explicit in the user behavior sequence modeling. Leveraging this structure, we introduce SVD-Attention, which is theoretically lossless on low-rank matrices and preserves softmax while reducing attention complexity from $O(N^2 d)$ to $O(Ndr)$. With SVD-Attention, we propose SOLAR, SVD-Optimized Lifelong Attention for Recommendation, a sequence modeling framework that supports behavior sequences of ten-thousand scale and candidate sets of several thousand items in cascading process without any filtering. In Kuaishou's online recommendation scenario, SOLAR delivers a 0.68\% Video Views gain together with additional business metrics improvements.