🤖 AI Summary
Live-streaming recommendation faces significant challenges due to rapidly evolving content, short lifecycles, stringent real-time requirements, and heterogeneous multi-objective optimization goals, rendering conventional generative approaches inadequate. This work proposes the first end-to-end dynamic unified generative recommendation framework tailored for live-streaming scenarios. It introduces three key innovations: a residual quantization–based dynamic tokenizer, a time-aware gated attention mechanism, and a decoder-only Transformer architecture integrating Sequential Multi-Task Pretraining (Sequential MTP) with Query-Key Normalization (QK Norm). Together, these components enable dynamic tokenization, precise temporal modeling, and effective multi-objective alignment. The proposed method substantially enhances recommendation responsiveness, training stability, and inference efficiency, thereby effectively supporting high-concurrency, personalized live-streaming recommendations under diverse objectives.
📝 Abstract
Live-streaming recommender system serves as critical infrastructure that bridges the patterns of real-time interactions between users and authors. Similar to traditional industrial recommender systems, live-streaming recommendation also relies on cascade architectures to support large-scale concurrency. Recent advances in generative recommendation unify the multi-stage recommendation process with Transformer-based architectures, offering improved scalability and higher computational efficiency. However, the inherent complexity of live-streaming prevents the direct transfer of these methods to live-streaming scenario, where continuously evolving content, limited lifecycles, strict real-time constraints, and heterogeneous multi-objectives introduce unique challenges that invalidate static tokenization and conventional model framework. To address these issues, we propose OneLive, a dynamically unified generative recommendation framework tailored for live-streaming scenario. OneLive integrates four key components: (i) A Dynamic Tokenizer that continuously encodes evolving real-time live content fused with behavior signal through residual quantization; (ii) A Time-Aware Gated Attention mechanism that explicitly models temporal dynamics for timely decision making; (iii) An efficient decoder-only generative architecture enhanced with Sequential MTP and QK Norm for stable training and accelerated inference; (iv) A Unified Multi-Objective Alignment Framework reinforces policy optimization for personalized preferences.