Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the tension between structural efficiency and sequence dependency modeling in industrial-scale generative re-ranking, where existing approaches struggle to simultaneously optimize page-level utility and ensure online stability. To resolve this, we propose Dual-Rerank, a unified framework that integrates the strengths of autoregressive and non-autoregressive models via sequence-level knowledge distillation. We further introduce a List-wise Decoupled Re-ranking Optimization (LDRO) mechanism, which, for the first time in industrial settings, jointly mitigates structural latency and misalignment between training objectives and deployment goals. The resulting method enables efficient and stable online reinforcement learning, significantly improving user satisfaction and watch time in live A/B tests while substantially reducing inference latency compared to autoregressive baselines.

Technology Category

Application Category

📝 Abstract

Kuaishou serves over 400 million daily active users, processing hundreds of millions of search queries daily against a repository of tens of billions of short videos. As the final decision layer, the reranking stage determines user experience by optimizing whole-page utility. While traditional score-and-sort methods fail to capture combinatorial dependencies, Generative Reranking offers a superior paradigm by directly modeling the permutation probability. However, deploying Generative Reranking in such a high-stakes environment faces a fundamental dual dilemma: 1) the structural trade-off where Autoregressive (AR) models offer superior Sequential modeling but suffer from prohibitive latency, versus Non-Autoregressive (NAR) models that enable efficiency but lack dependency capturing; 2) the optimization gap where Supervised Learning faces challenges in directly optimizing whole-page utility, while Reinforcement Learning (RL) struggles with instability in high-throughput data streams. To resolve this, we propose Dual-Rerank, a unified framework designed for industrial reranking that bridges the structural gap via Sequential Knowledge Distillation and addresses the optimization gap using List-wise Decoupled Reranking Optimization (LDRO) for stable online RL. Extensive A/B testing on production traffic demonstrates that Dual-Rerank achieves State-of-the-Art performance, significantly improving User satisfaction and Watch Time while drastically reducing inference latency compared to AR baselines.

Problem

Research questions and friction points this paper is trying to address.

Generative Reranking

Autoregressive Models

Non-Autoregressive Models

Reinforcement Learning

Whole-page Utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Reranking

Sequential Knowledge Distillation

List-wise Decoupled Reranking Optimization