DeGRe: Dense-supervised Generative Reranking for Recommendation

πŸ“… 2026-05-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of credit assignment in reranking for multi-stage recommender systems, where heuristic label bias and sparse rewards hinder efficient exploration of the exponentially large permutation space to identify optimal sequences. To overcome this, the authors propose a decoupled offline-online generative reranking framework. In the offline phase, a novel cumulative regression-based lookahead evaluator identifies high-value sequences and converts them into dense supervision signals. In the online phase, a lightweight generator is trained via knowledge distillation from these signals, enabling efficient inference and near-global optimization. The approach eliminates reliance on sparse rewards and heuristic labels inherent in prior generative reranking methods, achieving significant performance gains over state-of-the-art models on both public benchmarks and industrial datasets. It has been successfully deployed in Taobao’s flash-sale scenario, substantially improving online metrics.
πŸ“ Abstract
In multi-stage recommender systems, reranking optimizes overall utility by capturing intra-list contextual dependencies, yet its central challenge lies in exploring optimal sequences within an exponentially large permutation space. Recent studies have shifted towards end-to-end generative frameworks, which typically leverage list-wise rewards or preference alignment to guide generator training. However, these methods still face two critical issues. First is the heuristic label bias. Existing methods often construct training targets based on simple rules, such as promoting clicked items to the top, while ignoring causal dependencies within the list context. Second is the credit assignment problem. Sparse list-level posterior rewards fail to directly guide intermediate steps in sequence generation, leading to ambiguous optimization directions. To address these issues, we propose DeGRe (Dense-supervised Generative Reranking), a generative reranking framework that bridges the gap between offline exploration and online efficiency through dense supervision. The core of DeGRe lies in its offline-online decoupled design. During the offline phase, we introduce a Lookahead Evaluator based on cumulative regression, which leverages beam search to actively mine high-value lookahead sequences in the unexposed space. During training, we transform the step-wise value estimations from the evaluator into dense supervision signals and distill them into a lightweight Online Generator. This mechanism enables the generator to internalize lookahead planning capabilities, requiring only a single efficient greedy decoding pass during online inference to approximate the global optimum. Experiments demonstrate that DeGRe outperforms baseline models on public benchmarks and industrial datasets. We have successfully deployed DeGRe on Taobao Flash Shopping, significantly improving online recommendations.
Problem

Research questions and friction points this paper is trying to address.

reranking
label bias
credit assignment
generative recommendation
list-wise context
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative reranking
dense supervision
lookahead evaluator
credit assignment
multi-stage recommendation
πŸ”Ž Similar Papers
2024-02-10Knowledge Discovery and Data MiningCitations: 3
C
Chaotian Song
College of Software, Zhejiang University
Jingyao Zhang
Jingyao Zhang
University of California, Riverside
Computer ArchitectureComputer SecurityComputer System
C
Chenghao Chen
Rajax Network Technology, Taobao Shangou of Alibaba
Z
Zisen Sang
Rajax Network Technology, Taobao Shangou of Alibaba
Dehai Zhao
Dehai Zhao
CSIRO
Software engineering
G
Guodong Cao
Rajax Network Technology, Taobao Shangou of Alibaba
B
Boxi Wu
College of Software, Zhejiang University
Deng Cai
Deng Cai
Professor of Computer Science, Zhejiang University
Machine learningComputer visionData miningInformation retrieval
J
Jia Jia
Rajax Network Technology, Taobao Shangou of Alibaba