Efficient Item ID Generation for Large-Scale LLM-based Recommendation

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high latency caused by multi-token tokenization of item IDs in large language model (LLM)-based recommender systems, this paper treats item IDs as first-class citizens by directly modeling them as single-token embeddings within an end-to-end, single-step decoding architecture—enabling native, compact representation of million-scale discrete IDs in LLMs for the first time. The method integrates customized vocabulary expansion, item ID–aware pretraining, and lightweight inference optimization, eliminating conventional tokenization and post-processing overhead. Evaluated on Amazon benchmarks, our approach achieves significant improvements in Recall and NDCG over state-of-the-art baselines, while accelerating inference by 5–14×. This work establishes a scalable paradigm for efficiently incorporating discrete ID-based features into LLMs.

Technology Category

Application Category

📝 Abstract
Integrating product catalogs and user behavior into LLMs can enhance recommendations with broad world knowledge, but the scale of real-world item catalogs, often containing millions of discrete item identifiers (Item IDs), poses a significant challenge. This contrasts with the smaller, tokenized text vocabularies typically used in LLMs. The predominant view within the LLM-based recommendation literature is that it is infeasible to treat item ids as a first class citizen in the LLM and instead some sort of tokenization of an item into multiple tokens is required. However, this creates a key practical bottleneck in serving these models for real-time low-latency applications. Our paper challenges this predominant practice and integrates item ids as first class citizens into the LLM. We provide simple, yet highly effective, novel training and inference modifications that enable single-token representations of items and single-step decoding. Our method shows improvements in recommendation quality (Recall and NDCG) over existing techniques on the Amazon shopping datasets while significantly improving inference efficiency by 5x-14x. Our work offers an efficiency perspective distinct from that of other popular approaches within LLM-based recommendation, potentially inspiring further research and opening up a new direction for integrating IDs into LLMs. Our code is available here https://drive.google.com/file/d/1cUMj37rV0Z1bCWMdhQ6i4q4eTRQLURtC
Problem

Research questions and friction points this paper is trying to address.

Integrating large-scale item IDs into LLMs for recommendation
Addressing tokenization bottlenecks in real-time recommendation systems
Enabling single-token item representations in LLM-based recommendations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-token item ID representations in LLMs
Novel training and inference modifications
5x-14x inference efficiency improvement
🔎 Similar Papers
No similar papers found.
A
Anushya Subbiah
Google Research, Mountain View, USA
Vikram Aggarwal
Vikram Aggarwal
Google Research, Mountain View, USA
J
James Pine
Google Research, Mountain View, USA
Steffen Rendle
Steffen Rendle
Google
K
Krishna Sayana
Google Research, Mountain View, USA
Kun Su
Kun Su
Google Research
Multimodal LearningAudio/Music GenerationRecommendation system