BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

šŸ“… 2025-12-15
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
To address the high computational cost of Transformer self-attention and the imbalance between long-term and short-term interest modeling in sequential recommendation, this paper proposes Block-level Fusion Sparse Attention (BFSA). BFSA partitions user interaction histories into blocks, separately encoding long-term and short-term interests, and dynamically fuses their representations via a learnable gating mechanism. Its dual-mode sparse attention structure theoretically reduces the attention complexity from O(L²) to O(L√L), substantially decreasing GPU memory consumption. Extensive experiments on four public benchmark datasets demonstrate that BFSA achieves state-of-the-art (SOTA) or competitive performance while maintaining robust generalization across both short and long sequences. By jointly optimizing computational efficiency and recommendation accuracy, BFSA offers a scalable and effective solution for practical sequential recommendation systems.

Technology Category

Application Category

šŸ“ Abstract
Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness.The code is available at https://github.com/ronineume/BlossomRec.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational time and memory in sequential recommendations
Models both long-term and short-term user interests effectively
Achieves stable performance across varying sequence lengths
Innovation

Methods, ideas, or system contributions that make the work stand out.

Block-level fused sparse attention mechanism
Separate long-term and short-term interest modeling
Learnable gated output combines sparse attention patterns
šŸ”Ž Similar Papers
No similar papers found.
M
Mengyang Ma
City University of Hong Kong, Hong Kong, China
X
Xiaopeng Li
City University of Hong Kong, Hong Kong, China
W
Wanyu Wang
City University of Hong Kong, Hong Kong, China
Zhaocheng Du
Zhaocheng Du
Huawei Noah Ark's Lab
Machine LearningRecommendation System
Jingtong Gao
Jingtong Gao
PhD, City University of Hong Kong
recommender systemdeep learning
Pengyue Jia
Pengyue Jia
PhD candidate of Data Science, City University of Hong Kong
Information RetrievalLarge Language ModelsGeoAI
Y
Yuyang Ye
Rutgers University, New Jersey, United States
Y
Yiqi Wang
Michigan State University, Michigan, United States
Yunpeng Weng
Yunpeng Weng
Tencent, Sun Yat-Sen University
RecommendationGNNMarketingLLM
W
Weihong Luo
Tencent, Shenzhen, China
X
Xiao Han
Zhejiang University of Technology, Hangzhou, China
X
Xiangyu Zhao
City University of Hong Kong, Hong Kong, China