Enhancing Embedding Representation Stability in Recommendation Systems with Semantic ID

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online recommendation systems suffer from embedding instability and representation drift due to explosive ID cardinality, dynamic ID evolution, and long-tailed distributions. To address these challenges, we propose Semantic ID Prefix n-gram: a content-based hierarchical clustering approach that groups IDs into semantically coherent buckets—replacing conventional random hashing—and introduces ID-space parameterization with attention-based adaptation to enhance model robustness across ID lifecycle stages (e.g., cold-start and decay). Deployed in Meta’s ad ranking system, our method significantly improves prediction stability, yielding measurable gains in AUC and CTR, while boosting conversion rates for tail IDs by 12.7%—without incurring additional online latency. The core contribution is the first semantic-aware ID tokenization framework that jointly optimizes generalizability and embedding stability, effectively mitigating long-tail modeling difficulties and embedding drift.

Technology Category

Application Category

📝 Abstract
The exponential growth of online content has posed significant challenges to ID-based models in industrial recommendation systems, ranging from extremely high cardinality and dynamically growing ID space, to highly skewed engagement distributions, to prediction instability as a result of natural id life cycles (e.g, the birth of new IDs and retirement of old IDs). To address these issues, many systems rely on random hashing to handle the id space and control the corresponding model parameters (i.e embedding table). However, this approach introduces data pollution from multiple ids sharing the same embedding, leading to degraded model performance and embedding representation instability. This paper examines these challenges and introduces Semantic ID prefix ngram, a novel token parameterization technique that significantly improves the performance of the original Semantic ID. Semantic ID prefix ngram creates semantically meaningful collisions by hierarchically clustering items based on their content embeddings, as opposed to random assignments. Through extensive experimentation, we demonstrate that Semantic ID prefix ngram not only addresses embedding instability but also significantly improves tail id modeling, reduces overfitting, and mitigates representation shifts. We further highlight the advantages of Semantic ID prefix ngram in attention-based models that contextualize user histories, showing substantial performance improvements. We also report our experience of integrating Semantic ID into Meta production Ads Ranking system, leading to notable performance gains and enhanced prediction stability in live deployments.
Problem

Research questions and friction points this paper is trying to address.

Addresses embedding instability in recommendation systems
Improves tail ID modeling and reduces overfitting
Enhances prediction stability in dynamic ID spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic ID prefix ngram token parameterization
Hierarchical clustering for semantic collisions
Improved tail id modeling and stability
🔎 Similar Papers
No similar papers found.
Carolina Zheng
Carolina Zheng
Columbia University
Machine LearningNatural Language Processing
Minhui Huang
Minhui Huang
Research Scientist
machine learningoptimization
D
Dmitrii Pedchenko
AI at Meta
Kaushik Rangadurai
Kaushik Rangadurai
Researcher at Meta
Machine LearningArtificial IntelligenceSearch
S
Siyu Wang
AI at Meta
G
Gaby Nahum
AI at Meta
Jie Lei
Jie Lei
Universitat Politècnica de València
Computer EngineeringElectronic engineering
Y
Yang Yang
AI at Meta
T
Tao Liu
AI at Meta
Z
Zutian Luo
AI at Meta
X
Xiaohan Wei
AI at Meta
Dinesh Ramasamy
Dinesh Ramasamy
Meta
Recommendation systemsMachine learningSequence modeling
Jiyan Yang
Jiyan Yang
Stanford University
Y
Yiping Han
AI at Meta
L
Lin Yang
AI at Meta
H
Hangjun Xu
AI at Meta
R
Rong Jin
AI at Meta
S
Shuang Yang
AI at Meta