GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-stage recommendation, re-ranking requires modeling intra-list item correlations, yet existing two-stage (generator–evaluator) paradigms suffer from disjoint end-to-end training and inefficient autoregressive inference. This paper proposes GReF, a unified generative re-ranking framework. GReF innovatively integrates generation and evaluation into a single model, employing an ordered multi-token prediction mechanism for parallel decoding—balancing quality and latency. It introduces Rerank-DPO, enabling end-to-end optimization with sequence-level feedback. Architecturally, GReF adopts a bidirectional encoder coupled with a dynamic autoregressive decoder, enhanced by exposure-order pretraining and post-training strategies. Offline experiments demonstrate significant improvements over state-of-the-art methods; inference latency approaches that of non-autoregressive models. Deployed at Kuaishou (DAU > 300M), GReF delivers substantial online gains in key metrics.

Technology Category

Application Category

📝 Abstract
In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combinatorial space of permutations. Recent research follows a two-stage (generator-evaluator) paradigm, where a generator produces multiple feasible sequences, and an evaluator selects the best one. In practice, the generator is typically implemented as an autoregressive model. However, these two-stage methods face two main challenges. First, the separation of the generator and evaluator hinders end-to-end training. Second, autoregressive generators suffer from inference efficiency. In this work, we propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive generator featuring a bidirectional encoder and a dynamic autoregressive decoder to generate causal reranking sequences. Subsequently, we pre-train Gen-Reranker on the item exposure order for high-quality parameter initialization. To eliminate the need for the evaluator while integrating sequence-level evaluation during training for end-to-end optimization, we propose post-training the model through Rerank-DPO. Moreover, for efficient autoregressive inference, we introduce ordered multi-token prediction (OMTP), which trains Gen-Reranker to simultaneously generate multiple future items while preserving their order, ensuring practical deployment in real-time recommender systems. Extensive offline experiments demonstrate that GReF outperforms state-of-the-art reranking methods while achieving latency that is nearly comparable to non-autoregressive models. Additionally, GReF has also been deployed in a real-world video app Kuaishou with over 300 million daily active users, significantly improving online recommendation quality.
Problem

Research questions and friction points this paper is trying to address.

Addressing inefficient autoregressive inference in recommendation reranking systems
Integrating sequence evaluation into end-to-end training for reranking optimization
Solving separation between generator and evaluator in two-stage reranking paradigms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional encoder with dynamic autoregressive decoder
Post-training via Rerank-DPO for end-to-end optimization
Ordered multi-token prediction for efficient inference
🔎 Similar Papers
No similar papers found.
Zhijie Lin
Zhijie Lin
ByteDance Inc.
Machine learning
Z
Zhuofeng Li
Shanghai University, Shanghai, China
C
Chenglei Dai
Kuaishou Technology, Beijing, China
Wentian Bao
Wentian Bao
Alibaba Group
Recommender SystemInformation Retrieval
Shuai Lin
Shuai Lin
Kuaishou
Machine LearningGraph Data MiningNLP
E
Enyun Yu
Independent, Beijing, China
Haoxiang Zhang
Haoxiang Zhang
Queen’s University
Software EngineeringEmpirical Software EngineeringMining Software Repositories
L
Liang Zhao
Emory University, Atlanta, GA, USA