Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tension between structural efficiency and sequence dependency modeling in industrial-scale generative re-ranking, where existing approaches struggle to simultaneously optimize page-level utility and ensure online stability. To resolve this, we propose Dual-Rerank, a unified framework that integrates the strengths of autoregressive and non-autoregressive models via sequence-level knowledge distillation. We further introduce a List-wise Decoupled Re-ranking Optimization (LDRO) mechanism, which, for the first time in industrial settings, jointly mitigates structural latency and misalignment between training objectives and deployment goals. The resulting method enables efficient and stable online reinforcement learning, significantly improving user satisfaction and watch time in live A/B tests while substantially reducing inference latency compared to autoregressive baselines.
📝 Abstract
Kuaishou serves over 400 million daily active users, processing hundreds of millions of search queries daily against a repository of tens of billions of short videos. As the final decision layer, the reranking stage determines user experience by optimizing whole-page utility. While traditional score-and-sort methods fail to capture combinatorial dependencies, Generative Reranking offers a superior paradigm by directly modeling the permutation probability. However, deploying Generative Reranking in such a high-stakes environment faces a fundamental dual dilemma: 1) the structural trade-off where Autoregressive (AR) models offer superior Sequential modeling but suffer from prohibitive latency, versus Non-Autoregressive (NAR) models that enable efficiency but lack dependency capturing; 2) the optimization gap where Supervised Learning faces challenges in directly optimizing whole-page utility, while Reinforcement Learning (RL) struggles with instability in high-throughput data streams. To resolve this, we propose Dual-Rerank, a unified framework designed for industrial reranking that bridges the structural gap via Sequential Knowledge Distillation and addresses the optimization gap using List-wise Decoupled Reranking Optimization (LDRO) for stable online RL. Extensive A/B testing on production traffic demonstrates that Dual-Rerank achieves State-of-the-Art performance, significantly improving User satisfaction and Watch Time while drastically reducing inference latency compared to AR baselines.
Problem

Research questions and friction points this paper is trying to address.

Generative Reranking
Autoregressive Models
Non-Autoregressive Models
Reinforcement Learning
Whole-page Utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Reranking
Sequential Knowledge Distillation
List-wise Decoupled Reranking Optimization
Autoregressive vs Non-Autoregressive
Online Reinforcement Learning
🔎 Similar Papers
No similar papers found.
Chao Zhang
Chao Zhang
Alibaba
Shuai Lin
Shuai Lin
Kuaishou
Machine LearningGraph Data MiningNLP
C
ChengLei Dai
Kuaishou Technology
Y
Ye Qian
Kuaishou Technology
F
Fan Mingyang
Kuaishou Technology
Yi Zhang
Yi Zhang
Huawei Co., Ltd
CVAITrustworthy AI
Y
Yi Wang
Kuaishou Technology
Jingwei Zhuo
Jingwei Zhuo
JD Inc
Machine Learning