🤖 AI Summary
This work addresses key challenges in industrial-scale generative recommendation—namely, the difficulty of adapting large language models (LLMs), low inference efficiency, and weak semantic representation. To this end, we propose PLUM, a novel framework featuring: (1) semantic ID tokenization for interpretable and generalizable item encoding; (2) a two-stage adaptation paradigm combining continual pretraining with generative retrieval fine-tuning to integrate world knowledge and domain-specific semantics; and (3) scalable deployment supporting billion-scale users with low latency. PLUM marks the first successful deployment of LLMs for generative retrieval recommendation at YouTube-scale. Empirical evaluation on large-scale video recommendation demonstrates that PLUM significantly outperforms highly optimized production models, validating the effectiveness, scalability, and industrial practicality of the generative paradigm in real-world recommender systems.
📝 Abstract
Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvements for retrieval compared to a heavily-optimized production model built with large embedding tables. We also present a scaling study for the model's retrieval performance, our learnings about CPT, a few enhancements to Semantic IDs, along with an overview of the training and inference methods that enable launching this framework to billions of users in YouTube.