π€ AI Summary
In long-sequence recommendation, a single embedding table jointly serves both attention computation and user representation learning, causing functional coupling that impairs interest modeling and prediction accuracy. This paper is the first to systematically identify and address this embedding functional coupling issue, proposing DAREβa decoupled attention and representation embedding paradigm. DARE introduces separate embedding spaces for attention and representation, enabling dimensionality reduction and efficient approximate nearest-neighbor search; it further incorporates multi-stage behavior retrieval and attention mechanism optimization. Evaluated on public benchmarks, DARE achieves up to a 0.9% AUC improvement. Deployed on Tencentβs advertising platform, it significantly enhances online performance. Moreover, retrieval speed increases by 50%, substantially improving online serving efficiency.
π Abstract
Lifelong user behavior sequences are crucial for capturing user interests and predicting user responses in modern recommendation systems. A two-stage paradigm is typically adopted to handle these long sequences: a subset of relevant behaviors is first searched from the original long sequences via an attention mechanism in the first stage and then aggregated with the target item to construct a discriminative representation for prediction in the second stage. In this work, we identify and characterize, for the first time, a neglected deficiency in existing long-sequence recommendation models: a single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. Initial attempts to address this issue with some common methods (e.g., linear projections -- a technique borrowed from language processing) proved ineffective, shedding light on the unique challenges of recommendation models. To overcome this, we propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are initialized and learned separately to fully decouple attention and representation. Extensive experiments and analysis demonstrate that DARE provides more accurate searches of correlated behaviors and outperforms baselines with AUC gains up to 0.9% on public datasets and notable improvements on Tencent's advertising platform. Furthermore, decoupling embedding spaces allows us to reduce the attention embedding dimension and accelerate the search procedure by 50% without significant performance impact, enabling more efficient, high-performance online serving. Code in PyTorch for experiments, including model analysis, is available at https://github.com/thuml/DARE.