BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the high computational cost of Transformer self-attention and the imbalance between long-term and short-term interest modeling in sequential recommendation, this paper proposes Block-level Fusion Sparse Attention (BFSA). BFSA partitions user interaction histories into blocks, separately encoding long-term and short-term interests, and dynamically fuses their representations via a learnable gating mechanism. Its dual-mode sparse attention structure theoretically reduces the attention complexity from O(L²) to O(L√L), substantially decreasing GPU memory consumption. Extensive experiments on four public benchmark datasets demonstrate that BFSA achieves state-of-the-art (SOTA) or competitive performance while maintaining robust generalization across both short and long sequences. By jointly optimizing computational efficiency and recommendation accuracy, BFSA offers a scalable and effective solution for practical sequential recommendation systems.

Technology Category

Application Category

📝 Abstract

Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness.The code is available at https://github.com/ronineume/BlossomRec.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational time and memory in sequential recommendations

Models both long-term and short-term user interests effectively

Achieves stable performance across varying sequence lengths

Innovation

Methods, ideas, or system contributions that make the work stand out.

Block-level fused sparse attention mechanism

Separate long-term and short-term interest modeling

Learnable gated output combines sparse attention patterns

🔎 Similar Papers

No similar papers found.