OneRec Technical Report

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Recommendation systems have long been constrained by multi-stage cascaded architectures, resulting in fragmented computation, misaligned optimization objectives, and difficulty incorporating state-of-the-art AI advances. This paper proposes OneRec—the first end-to-end generative architecture tailored for industrial recommendation, unifying recall, ranking, and generation into a single trainable framework for full-pipeline joint optimization. Key contributions include: (1) establishing a recommendation-specific end-to-end generative paradigm; (2) the first successful deployment of reinforcement learning for optimization in production-scale recommendation; (3) discovery and empirical validation of scaling laws for recommendation models; and (4) FLOPs-aware model scaling coupled with deep GPU optimization, achieving 23.7%/28.8% MFU—comparable to large language models. Experiments demonstrate a 10.6% reduction in operational cost versus conventional pipelines, support for 25% of Kuaishou APP’s QPS, 0.54%–1.24% increase in average user session duration, and significant growth in 7-day user lifetime value.

Technology Category

Application Category

📝 Abstract

Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimization inconsistencies, and hindering the effective application of key breakthrough technologies from the AI community in recommendation scenarios. To address these issues, we propose OneRec, which reshapes the recommendation system through an end-to-end generative approach and achieves promising results. Firstly, we have enhanced the computational FLOPs of the current recommendation model by 10 $ imes$ and have identified the scaling laws for recommendations within certain boundaries. Secondly, reinforcement learning techniques, previously difficult to apply for optimizing recommendations, show significant potential in this framework. Lastly, through infrastructure optimizations, we have achieved 23.7% and 28.8% Model FLOPs Utilization (MFU) on flagship GPUs during training and inference, respectively, aligning closely with the LLM community. This architecture significantly reduces communication and storage overhead, resulting in operating expense that is only 10.6% of traditional recommendation pipelines. Deployed in Kuaishou/Kuaishou Lite APP, it handles 25% of total queries per second, enhancing overall App Stay Time by 0.54% and 1.24%, respectively. Additionally, we have observed significant increases in metrics such as 7-day Lifetime, which is a crucial indicator of recommendation experience. We also provide practical lessons and insights derived from developing, optimizing, and maintaining a production-scale recommendation system with significant real-world impact.

Problem

Research questions and friction points this paper is trying to address.

Recommender systems lack end-to-end architecture causing inefficiencies

Existing systems struggle to apply AI breakthroughs effectively

Multi-stage designs lead to optimization inconsistencies and high costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end generative recommendation approach

Reinforcement learning for optimization

Infrastructure optimizations for high MFU

🔎 Similar Papers

On-Device Recommender Systems: A Comprehensive Survey