Generating Long Semantic IDs in Parallel for Recommendation

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional semantic ID recommendation models are constrained by autoregressive generation paradigms, yielding short semantic IDs (typically 4 tokens), which suffer from limited expressiveness and low inference efficiency. To address this, we propose the RPG framework—the first to enable unordered, long semantic ID generation in parallel. RPG introduces a multi-token independent prediction loss to directly model semantic associations, incorporates graph-structured decoding guidance to avoid invalid IDs, and integrates lightweight sequence modeling with semantic-aware tokenization. This paradigm shift extends semantic ID length to 64 tokens while preserving coherence and validity. Evaluated on multiple public benchmarks, RPG achieves an average 12.6% improvement in NDCG@10 over strong baselines. The approach significantly enhances representation capacity, recommendation accuracy, and inference throughput—demonstrating superior scalability and effectiveness for large-scale semantic ID learning.

Technology Category

Application Category

📝 Abstract
Semantic ID-based recommendation models tokenize each item into a small number of discrete tokens that preserve specific semantics, leading to better performance, scalability, and memory efficiency. While recent models adopt a generative approach, they often suffer from inefficient inference due to the reliance on resource-intensive beam search and multiple forward passes through the neural sequence model. As a result, the length of semantic IDs is typically restricted (e.g. to just 4 tokens), limiting their expressiveness. To address these challenges, we propose RPG, a lightweight framework for semantic ID-based recommendation. The key idea is to produce unordered, long semantic IDs, allowing the model to predict all tokens in parallel. We train the model to predict each token independently using a multi-token prediction loss, directly integrating semantics into the learning objective. During inference, we construct a graph connecting similar semantic IDs and guide decoding to avoid generating invalid IDs. Experiments show that scaling up semantic ID length to 64 enables RPG to outperform generative baselines by an average of 12.6% on the NDCG@10, while also improving inference efficiency. Code is available at: https://github.com/facebookresearch/RPG_KDD2025.
Problem

Research questions and friction points this paper is trying to address.

Inefficient inference in generative semantic ID models
Limited expressiveness due to short semantic IDs
Need for parallel prediction of long semantic IDs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel generation of long semantic IDs
Multi-token prediction loss integration
Graph-guided decoding for valid IDs
🔎 Similar Papers
No similar papers found.
Yupeng Hou
Yupeng Hou
University of California, San Diego
Recommender SystemsLarge Language Models
J
Jiacheng Li
Meta AI
A
Ashley Shin
University of California, San Diego
J
Jinsung Jeon
University of California, San Diego
A
Abhishek Santhanam
University of California, San Diego
W
Wei Shao
Meta AI
Kaveh Hassani
Kaveh Hassani
Research Scientist, Meta Superintelligence Labs
Deep Learning
N
Ning Yao
Meta AI
Julian McAuley
Julian McAuley
Professor, UC San Diego
Recommender SystemsNatural Language ProcessingPersonalizationComputer Music