PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in industrial-scale generative recommendation—namely, the difficulty of adapting large language models (LLMs), low inference efficiency, and weak semantic representation. To this end, we propose PLUM, a novel framework featuring: (1) semantic ID tokenization for interpretable and generalizable item encoding; (2) a two-stage adaptation paradigm combining continual pretraining with generative retrieval fine-tuning to integrate world knowledge and domain-specific semantics; and (3) scalable deployment supporting billion-scale users with low latency. PLUM marks the first successful deployment of LLMs for generative retrieval recommendation at YouTube-scale. Empirical evaluation on large-scale video recommendation demonstrates that PLUM significantly outperforms highly optimized production models, validating the effectiveness, scalability, and industrial practicality of the generative paradigm in real-world recommender systems.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvements for retrieval compared to a heavily-optimized production model built with large embedding tables. We also present a scaling study for the model's retrieval performance, our learnings about CPT, a few enhancements to Semantic IDs, along with an overview of the training and inference methods that enable launching this framework to billions of users in YouTube.
Problem

Research questions and friction points this paper is trying to address.

Adapting pre-trained language models for industrial-scale generative recommendation systems
Developing framework to generate Semantic IDs for recommended items from user context
Improving retrieval performance over traditional embedding table approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts pre-trained LLMs for industrial recommendation tasks
Uses Semantic IDs for item tokenization and generative retrieval
Continued pre-training on domain data with task-specific fine-tuning
🔎 Similar Papers
No similar papers found.
R
Ruining He
Google DeepMind
L
Lukasz Heldt
YouTube
Lichan Hong
Lichan Hong
Google DeepMind
Recommendation SystemLLMDeep LearningSocial ComputingVisualization
R
Raghunandan Keshavan
YouTube
S
Shifan Mao
YouTube
Nikhil Mehta
Nikhil Mehta
Google DeepMind
Deep LearningContinual Online LearningBayesian Neural Networks
Z
Zhengyang Su
YouTube
A
Alicia Tsai
Google DeepMind
Y
Yueqi Wang
YouTube
S
Shao-Chuan Wang
Google DeepMind
Xinyang Yi
Xinyang Yi
Google DeepMind
Machine LearningLLMsRecommendations
L
Lexi Baugher
YouTube
B
Baykal Cakici
YouTube
E
Ed Chi
Google DeepMind
C
Cristos Goodrow
YouTube
N
Ningren Han
YouTube
H
He Ma
YouTube
Romer Rosales
Romer Rosales
LinkedIn
Machine LearningCrowdsourcingComputational AdvertisingHealthcare
A
Abby Van Soest
YouTube
D
Devansh Tandon
YouTube
S
Su-Lin Wu
YouTube
Weilong Yang
Weilong Yang
YouTube
Y
Yilin Zheng
YouTube