🤖 AI Summary
OneRec-V1 faces two critical bottlenecks in production deployment: severe computational resource imbalance—97.66% of computation is consumed by sequence encoding—and limited generalization due to pure reward-model-driven reinforcement learning (RL). To address these, we reformulate recommendation as an autoregressive generation task and propose Lazy Decoder-Only, a novel architecture featuring lazy decoding that reduces total computation by 94%. We further design duration-aware reward shaping grounded in real user dwell time and incorporate adaptive ratio clipping to improve preference alignment. Our approach unifies generative AI, decoder-only modeling, and online RL. Evaluated on Kuaishou’s production system, it increases average app dwell time by 0.467% and 0.741% across two major traffic segments, achieves more balanced multi-objective recommendations, and scales successfully to an 8B-parameter model—significantly enhancing scalability and online efficacy.
📝 Abstract
Recent breakthroughs in generative AI have transformed recommender systems through end-to-end generation. OneRec reformulates recommendation as an autoregressive generation task, achieving high Model FLOPs Utilization. While OneRec-V1 has shown significant empirical success in real-world deployment, two critical challenges hinder its scalability and performance: (1) inefficient computational allocation where 97.66% of resources are consumed by sequence encoding rather than generation, and (2) limitations in reinforcement learning relying solely on reward models.
To address these challenges, we propose OneRec-V2, featuring: (1) Lazy Decoder-Only Architecture: Eliminates encoder bottlenecks, reducing total computation by 94% and training resources by 90%, enabling successful scaling to 8B parameters. (2) Preference Alignment with Real-World User Interactions: Incorporates Duration-Aware Reward Shaping and Adaptive Ratio Clipping to better align with user preferences using real-world feedback.
Extensive A/B tests on Kuaishou demonstrate OneRec-V2's effectiveness, improving App Stay Time by 0.467%/0.741% while balancing multi-objective recommendations. This work advances generative recommendation scalability and alignment with real-world feedback, representing a step forward in the development of end-to-end recommender systems.