π€ AI Summary
This work addresses the trade-off between representational capacity and computational efficiency in generative recommendation, where short semantic IDs lack sufficient expressiveness while long IDs suffer from fine-grained information loss due to coarse-grained compression. To bridge this granularity gap between fine-grained tokenization and efficient sequence modeling, we propose ACERec, a novel framework that decouples these two aspects for the first time. ACERec compresses long semantic IDs via an attention-based token merging mechanism and introduces intent tokens as dynamic prediction anchors. A dual-granularity learning objective jointly optimizes alignment between fine-grained and global semantics. Extensive experiments on six real-world datasets show that ACERec achieves an average 14.40% improvement in NDCG@10 over state-of-the-art methods, demonstrating its superior effectiveness.
π Abstract
Semantic ID-based generative recommendation represents items as sequences of discrete tokens, but it inherently faces a trade-off between representational expressiveness and computational efficiency. Residual Quantization (RQ)-based approaches restrict semantic IDs to be short to enable tractable sequential modeling, while Optimized Product Quantization (OPQ)-based methods compress long semantic IDs through naive rigid aggregation, inevitably discarding fine-grained semantic information. To resolve this dilemma, we propose ACERec, a novel framework that decouples the granularity gap between fine-grained tokenization and efficient sequential modeling. It employs an Attentive Token Merger to distill long expressive semantic tokens into compact latents and introduces a dedicated Intent Token serving as a dynamic prediction anchor. To capture cohesive user intents, we guide the learning process via a dual-granularity objective, harmonizing fine-grained token prediction with global item-level semantic alignment. Extensive experiments on six real-world benchmarks demonstrate that ACERec consistently outperforms state-of-the-art baselines, achieving an average improvement of 14.40\% in NDCG@10, effectively reconciling semantic expressiveness and computational efficiency.