Unified Semantic and ID Representation Learning for Deep Recommenders

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak generalization and poor cold-start performance in recommender systems—caused by redundant ID representations and unstable semantic representations—this paper proposes a joint representation learning framework for ID and semantic tokens. We introduce a novel dual-channel token modeling mechanism, theoretically and empirically revealing the hierarchical advantages of cosine similarity for embedding disentanglement and Euclidean distance for discriminative decision-making, and accordingly design a hybrid distance metric architecture. Our method employs deep neural networks to jointly optimize ID tokens and semantic tokens in an end-to-end manner. Extensive experiments on three benchmark datasets demonstrate that our approach outperforms state-of-the-art models by 6%–17% in recommendation accuracy, reduces token parameter count by over 80%, and significantly improves cold-start and long-tail item recommendation performance.

Technology Category

Application Category

📝 Abstract
Effective recommendation is crucial for large-scale online platforms. Traditional recommendation systems primarily rely on ID tokens to uniquely identify items, which can effectively capture specific item relationships but suffer from issues such as redundancy and poor performance in cold-start scenarios. Recent approaches have explored using semantic tokens as an alternative, yet they face challenges, including item duplication and inconsistent performance gains, leaving the potential advantages of semantic tokens inadequately examined. To address these limitations, we propose a Unified Semantic and ID Representation Learning framework that leverages the complementary strengths of both token types. In our framework, ID tokens capture unique item attributes, while semantic tokens represent shared, transferable characteristics. Additionally, we analyze the role of cosine similarity and Euclidean distance in embedding search, revealing that cosine similarity is more effective in decoupling accumulated embeddings, while Euclidean distance excels in distinguishing unique items. Our framework integrates cosine similarity in earlier layers and Euclidean distance in the final layer to optimize representation learning. Experiments on three benchmark datasets show that our method significantly outperforms state-of-the-art baselines, with improvements ranging from 6% to 17% and a reduction in token size by over 80%. These results demonstrate the effectiveness of combining ID and semantic tokenization to enhance the generalization ability of recommender systems.
Problem

Research questions and friction points this paper is trying to address.

Combining ID and semantic tokens for recommendations
Optimizing embedding search with cosine and Euclidean metrics
Enhancing recommender system generalization and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Semantic and ID Representation Learning
Cosine similarity decouples embeddings effectively
Combines ID and semantic tokens for better recommendations
🔎 Similar Papers
No similar papers found.
G
Guanyu Lin
University of Illinois at Urbana-Champaign, Meta AI
Z
Zhigang Hua
Meta AI
T
Tao Feng
University of Illinois at Urbana-Champaign
S
Shuang Yang
Meta AI
Bo Long
Bo Long
Machine Learning
data miningmachine learning
Jiaxuan You
Jiaxuan You
Assistant Professor, UIUC CS
Foundation ModelsGNNLarge Language Models