Long-Sequence Recommendation Models Need Decoupled Embeddings

📅 2024-10-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

In long-sequence recommendation, a single embedding table jointly serves both attention computation and user representation learning, causing functional coupling that impairs interest modeling and prediction accuracy. This paper is the first to systematically identify and address this embedding functional coupling issue, proposing DARE—a decoupled attention and representation embedding paradigm. DARE introduces separate embedding spaces for attention and representation, enabling dimensionality reduction and efficient approximate nearest-neighbor search; it further incorporates multi-stage behavior retrieval and attention mechanism optimization. Evaluated on public benchmarks, DARE achieves up to a 0.9% AUC improvement. Deployed on Tencent’s advertising platform, it significantly enhances online performance. Moreover, retrieval speed increases by 50%, substantially improving online serving efficiency.

Technology Category

Application Category

📝 Abstract

Lifelong user behavior sequences are crucial for capturing user interests and predicting user responses in modern recommendation systems. A two-stage paradigm is typically adopted to handle these long sequences: a subset of relevant behaviors is first searched from the original long sequences via an attention mechanism in the first stage and then aggregated with the target item to construct a discriminative representation for prediction in the second stage. In this work, we identify and characterize, for the first time, a neglected deficiency in existing long-sequence recommendation models: a single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. Initial attempts to address this issue with some common methods (e.g., linear projections -- a technique borrowed from language processing) proved ineffective, shedding light on the unique challenges of recommendation models. To overcome this, we propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are initialized and learned separately to fully decouple attention and representation. Extensive experiments and analysis demonstrate that DARE provides more accurate searches of correlated behaviors and outperforms baselines with AUC gains up to 0.9% on public datasets and notable improvements on Tencent's advertising platform. Furthermore, decoupling embedding spaces allows us to reduce the attention embedding dimension and accelerate the search procedure by 50% without significant performance impact, enabling more efficient, high-performance online serving. Code in PyTorch for experiments, including model analysis, is available at https://github.com/thuml/DARE.

Problem

Research questions and friction points this paper is trying to address.

Single embeddings struggle with learning attention and representation.

Existing methods fail to address interference in recommendation models.

Proposed DARE model decouples embeddings for improved performance and efficiency.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Attention and Representation Embeddings (DARE)

Separate embedding tables for attention and representation

Reduced attention embedding dimension for efficiency

🔎 Similar Papers

Improving Sequential Recommendations with LLMs