PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses key challenges in industrial-scale generative recommendation—namely, the difficulty of adapting large language models (LLMs), low inference efficiency, and weak semantic representation. To this end, we propose PLUM, a novel framework featuring: (1) semantic ID tokenization for interpretable and generalizable item encoding; (2) a two-stage adaptation paradigm combining continual pretraining with generative retrieval fine-tuning to integrate world knowledge and domain-specific semantics; and (3) scalable deployment supporting billion-scale users with low latency. PLUM marks the first successful deployment of LLMs for generative retrieval recommendation at YouTube-scale. Empirical evaluation on large-scale video recommendation demonstrates that PLUM significantly outperforms highly optimized production models, validating the effectiveness, scalability, and industrial practicality of the generative paradigm in real-world recommender systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvements for retrieval compared to a heavily-optimized production model built with large embedding tables. We also present a scaling study for the model's retrieval performance, our learnings about CPT, a few enhancements to Semantic IDs, along with an overview of the training and inference methods that enable launching this framework to billions of users in YouTube.

Problem

Research questions and friction points this paper is trying to address.

Adapting pre-trained language models for industrial-scale generative recommendation systems

Developing framework to generate Semantic IDs for recommended items from user context

Improving retrieval performance over traditional embedding table approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts pre-trained LLMs for industrial recommendation tasks

Uses Semantic IDs for item tokenization and generative retrieval

Continued pre-training on domain data with task-specific fine-tuning

🔎 Similar Papers

Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application