OneRec Technical Report

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Recommendation systems have long been constrained by multi-stage cascaded architectures, resulting in fragmented computation, misaligned optimization objectives, and difficulty incorporating state-of-the-art AI advances. This paper proposes OneRec—the first end-to-end generative architecture tailored for industrial recommendation, unifying recall, ranking, and generation into a single trainable framework for full-pipeline joint optimization. Key contributions include: (1) establishing a recommendation-specific end-to-end generative paradigm; (2) the first successful deployment of reinforcement learning for optimization in production-scale recommendation; (3) discovery and empirical validation of scaling laws for recommendation models; and (4) FLOPs-aware model scaling coupled with deep GPU optimization, achieving 23.7%/28.8% MFU—comparable to large language models. Experiments demonstrate a 10.6% reduction in operational cost versus conventional pipelines, support for 25% of Kuaishou APP’s QPS, 0.54%–1.24% increase in average user session duration, and significant growth in 7-day user lifetime value.

Technology Category

Application Category

📝 Abstract
Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimization inconsistencies, and hindering the effective application of key breakthrough technologies from the AI community in recommendation scenarios. To address these issues, we propose OneRec, which reshapes the recommendation system through an end-to-end generative approach and achieves promising results. Firstly, we have enhanced the computational FLOPs of the current recommendation model by 10 $ imes$ and have identified the scaling laws for recommendations within certain boundaries. Secondly, reinforcement learning techniques, previously difficult to apply for optimizing recommendations, show significant potential in this framework. Lastly, through infrastructure optimizations, we have achieved 23.7% and 28.8% Model FLOPs Utilization (MFU) on flagship GPUs during training and inference, respectively, aligning closely with the LLM community. This architecture significantly reduces communication and storage overhead, resulting in operating expense that is only 10.6% of traditional recommendation pipelines. Deployed in Kuaishou/Kuaishou Lite APP, it handles 25% of total queries per second, enhancing overall App Stay Time by 0.54% and 1.24%, respectively. Additionally, we have observed significant increases in metrics such as 7-day Lifetime, which is a crucial indicator of recommendation experience. We also provide practical lessons and insights derived from developing, optimizing, and maintaining a production-scale recommendation system with significant real-world impact.
Problem

Research questions and friction points this paper is trying to address.

Recommender systems lack end-to-end architecture causing inefficiencies
Existing systems struggle to apply AI breakthroughs effectively
Multi-stage designs lead to optimization inconsistencies and high costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end generative recommendation approach
Reinforcement learning for optimization
Infrastructure optimizations for high MFU
🔎 Similar Papers
No similar papers found.
Guorui Zhou
Guorui Zhou
Unknown affiliation
Recommender System,Advertising,Artificial Intelligence,Machine Learning,NLP
J
Jiaxin Deng
Jinghao Zhang
Jinghao Zhang
Kuaishou Tech
Recommender SystemsMultimediaLarge Language Model
K
Kuo Cai
L
Lejian Ren
Qiang Luo
Qiang Luo
Principal Investigator, ISTBI (类脑智能科学与技术研究院), Fudan University
Computational PsychiatryNeuroImageComplex Causal Models
Q
Qianqian Wang
Q
Qigen Hu
R
Rui Huang
S
Shiyao Wang
W
Weifeng Ding
W
Wuchao Li
Xinchen Luo
Xinchen Luo
kuaishou
X
Xingmei Wang
Z
Zexuan Cheng
Zixing Zhang
Zixing Zhang
Professor, Hunan University
Artifical IntelligenceSpeech ProcessingAffective ComputingDigital HealthAutomatic Speech Recognition
B
Bin Zhang
B
Boxuan Wang
Chaoyi Ma
Chaoyi Ma
University of Florida
Data ScienceBig DataNetwork Traffic MeasurementData Streaming Summay
Chengru Song
Chengru Song
Unknown affiliation
Chenhui Wang
Chenhui Wang
PhD Candidate, Fudan University
AI for NeuroscienceComputer Vision
D
Di Wang
D
Dongxue Meng
F
Fan Yang
F
Fangyu Zhang
F
Feng Jiang
Fuxing Zhang
Fuxing Zhang
G
Gang Wang
G
Guowang Zhang
H
Han Li
Hengrui Hu
Hengrui Hu
Fudan University
Segmentation
H
Hezheng Lin
H
Hongtao Cheng
H
Hongyang Cao
H
Huanjie Wang
J
Jiaming Huang
J
Jiapeng Chen
J
Jiaqiang Liu
J
Jinghui Jia
Kun Gai
Kun Gai
Senior Director & Researcher, Alibaba Group
Machine LearningComputational Advertising
Lantao Hu
Lantao Hu
Kuaishou Inc.
data miningrecommeder system
L
Liang Zeng
L
Liao Yu
Q
Qiang Wang
Q
Qidong Zhou
S
Shengzhe Wang
S
Shihui He
S
Shuang Yang
Shujie Yang
Shujie Yang
Tsinghua University
Large Language ModelAIGCAI for Science
Sui Huang
Sui Huang
T
Tao Wu
Tiantian He
Tiantian He
PhD student, University College London
AI AgentProbabilistic modellingGraph learningSpatio-temporal modellingAI for Neuroscience
T
Tingting Gao
W
Wei Yuan
X
Xiao Liang
X
Xiaoxiao Xu
X
Xugang Liu
Y
Yan Wang
Y
Yi Wang
Y
Yiwu Liu
Yue Song
Yue Song
Caltech
Machine LearningGeometric Deep LearningAI4Science
Y
Yufei Zhang
Yunfan Wu
Yunfan Wu
Institute of Computing Technology, Chinese Academy of Sciences
Recommender SystemCollaborative FilteringAdversary Attack
Yunfeng Zhao
Yunfeng Zhao
Tianjin University
Edge computing
Zhanyu Liu
Zhanyu Liu
Shanghai Jiao Tong University
Recommendation SystemLarge Language ModelData MiningTime Series Analysis