RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

๐Ÿ“… 2026-02-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing generative retrieval methods in recommendation systems rely solely on next-token prediction, which struggles to capture the hierarchical structure of user preferences and the deep interactions between items and behavioral sequences. This work proposes RankGR, the first approach to integrate listwise direct preference optimization (Listwise DPO) into generative retrieval. RankGR employs a two-stage collaborative mechanism: an Initial Assessment Phase (IAP) for coarse candidate filtering, followed by a Refinement Scoring Phase (RSP) that leverages a lightweight scoring module for fine-grained ranking. This design balances modeling depth with inference efficiency, enabling high-concurrency real-time deployment. Experiments demonstrate that RankGR significantly improves offline metrics across multiple academic and industrial datasets and delivers substantial online gains in Taobaoโ€™s โ€œGuess You Likeโ€ scenario, stably supporting nearly 10,000 queries per second.

Technology Category

Application Category

๐Ÿ“ Abstract
Generative retrieval (GR) has emerged as a promising paradigm in recommendation systems by autoregressively decoding identifiers of target items. Despite its potential, current approaches typically rely on the next-token prediction schema, which treats each token of the next interacted items as the sole target. This narrow focus 1) limits their ability to capture the nuanced structure of user preferences, and 2) overlooks the deep interaction between decoded identifiers and user behavior sequences. In response to these challenges, we propose RankGR, a Rank-enhanced Generative Retrieval method that incorporates listwise direct preference optimization for recommendation. RankGR decomposes the retrieval process into two complementary stages: the Initial Assessment Phase (IAP) and the Refined Scoring Phase (RSP). In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences and more effective partial-order modeling. The RSP then refines the top-{\lambda} candidates generated by IAP with interactions towards input sequences using a lightweight scoring module, leading to more precise candidate evaluation. Both phases are jointly optimized under a unified GR model, ensuring consistency and efficiency. Additionally, we implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second. Extensive offline performance on both research and industrial datasets, as well as the online gains on the"Guess You Like"section of Taobao, validate the effectiveness and scalability of RankGR.
Problem

Research questions and friction points this paper is trying to address.

Generative Retrieval
User Preferences
Next-Token Prediction
Recommendation Systems
Listwise Preference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Retrieval
Listwise Direct Preference Optimization
Rank-enhanced Recommendation
Two-stage Retrieval
Real-time Recommendation System
๐Ÿ”Ž Similar Papers
No similar papers found.
Kairui Fu
Kairui Fu
Zhejiang University
C
Changfa Wu
Alibaba Group
K
Kun Yuan
Alibaba Group
B
Binbin Cao
Alibaba Group
D
Dunxian Huang
Alibaba Group
Y
Yuliang Yan
Alibaba Group
J
Junjun Zheng
Alibaba Group
J
Jianning Zhang
Alibaba Group
S
Silu Zhou
Alibaba Group
Jian Wu
Jian Wu
Unknown affiliation
Music Generation
Kun Kuang
Kun Kuang
Zhejiang University
Causal InferenceData MiningMachine Learning