🤖 AI Summary
This work addresses the effectiveness and deployability of generative recommendation systems in large-scale industrial settings—specifically, the Xiaohongshu discovery feed serving hundreds of millions of daily active users—during the ranking stage. We propose RankGPT, a lightweight, production-ready generative ranking architecture. Contrary to prior studies attributing performance gains to training paradigms, we theoretically analyze and empirically validate— for the first time—that the generative modeling paradigm itself is the primary driver of improvement. Methodologically, RankGPT integrates LLM-informed architecture design, low-overhead inference optimization, and an online A/B testing framework to enable efficient industrial deployment. Live experiments demonstrate statistically significant improvements in user satisfaction metrics, while computational overhead remains comparable to the incumbent system. This study establishes a critical pathway and empirical benchmark for industrializing generative recommender systems.
📝 Abstract
Generative recommendation has recently emerged as a promising paradigm in information retrieval. However, generative ranking systems are still understudied, particularly with respect to their effectiveness and feasibility in large-scale industrial settings. This paper investigates this topic at the ranking stage of Xiaohongshu's Explore Feed, a recommender system that serves hundreds of millions of users. Specifically, we first examine how generative ranking outperforms current industrial recommenders. Through theoretical and empirical analyses, we find that the primary improvement in effectiveness stems from the generative architecture, rather than the training paradigm. To facilitate efficient deployment of generative ranking, we introduce RankGPT, a novel generative architecture for ranking. We validate the effectiveness and efficiency of our solution through online A/B experiments. The results show that RankGPT achieves significant improvements in user satisfaction with nearly equivalent computational resources compared to the existing production system.