OneRec-V2 Technical Report

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

OneRec-V1 faces two critical bottlenecks in production deployment: severe computational resource imbalance—97.66% of computation is consumed by sequence encoding—and limited generalization due to pure reward-model-driven reinforcement learning (RL). To address these, we reformulate recommendation as an autoregressive generation task and propose Lazy Decoder-Only, a novel architecture featuring lazy decoding that reduces total computation by 94%. We further design duration-aware reward shaping grounded in real user dwell time and incorporate adaptive ratio clipping to improve preference alignment. Our approach unifies generative AI, decoder-only modeling, and online RL. Evaluated on Kuaishou’s production system, it increases average app dwell time by 0.467% and 0.741% across two major traffic segments, achieves more balanced multi-objective recommendations, and scales successfully to an 8B-parameter model—significantly enhancing scalability and online efficacy.

Technology Category

Application Category

📝 Abstract

Recent breakthroughs in generative AI have transformed recommender systems through end-to-end generation. OneRec reformulates recommendation as an autoregressive generation task, achieving high Model FLOPs Utilization. While OneRec-V1 has shown significant empirical success in real-world deployment, two critical challenges hinder its scalability and performance: (1) inefficient computational allocation where 97.66% of resources are consumed by sequence encoding rather than generation, and (2) limitations in reinforcement learning relying solely on reward models. To address these challenges, we propose OneRec-V2, featuring: (1) Lazy Decoder-Only Architecture: Eliminates encoder bottlenecks, reducing total computation by 94% and training resources by 90%, enabling successful scaling to 8B parameters. (2) Preference Alignment with Real-World User Interactions: Incorporates Duration-Aware Reward Shaping and Adaptive Ratio Clipping to better align with user preferences using real-world feedback. Extensive A/B tests on Kuaishou demonstrate OneRec-V2's effectiveness, improving App Stay Time by 0.467%/0.741% while balancing multi-objective recommendations. This work advances generative recommendation scalability and alignment with real-world feedback, representing a step forward in the development of end-to-end recommender systems.

Problem

Research questions and friction points this paper is trying to address.

Optimize computational efficiency in generative recommender systems

Address reinforcement learning limitations with reward models

Enhance alignment with real-world user interaction preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lazy Decoder-Only Architecture reducing computation

Duration-Aware Reward Shaping for preference alignment

Adaptive Ratio Clipping with real-world feedback

🔎 Similar Papers

On-Device Recommender Systems: A Comprehensive Survey