š¤ AI Summary
To address the high computational cost of Transformer self-attention and the imbalance between long-term and short-term interest modeling in sequential recommendation, this paper proposes Block-level Fusion Sparse Attention (BFSA). BFSA partitions user interaction histories into blocks, separately encoding long-term and short-term interests, and dynamically fuses their representations via a learnable gating mechanism. Its dual-mode sparse attention structure theoretically reduces the attention complexity from O(L²) to O(LāL), substantially decreasing GPU memory consumption. Extensive experiments on four public benchmark datasets demonstrate that BFSA achieves state-of-the-art (SOTA) or competitive performance while maintaining robust generalization across both short and long sequences. By jointly optimizing computational efficiency and recommendation accuracy, BFSA offers a scalable and effective solution for practical sequential recommendation systems.
š Abstract
Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness.The code is available at https://github.com/ronineume/BlossomRec.