GoalRank: Group-Relative Optimization for a Large Ranking Model

πŸ“… 2025-09-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing ranking methods rely on a generator-evaluator two-stage paradigm, but scaling the candidate set fails to overcome combinatorial search bottlenecks, leading to performance saturation. This paper proposes a pure generative, single-stage large-ranking model that eliminates the evaluator and directly produces high-quality ranked lists in an end-to-end manner. Our key contributions are: (1) a theoretical proof showing that generator-only models incur smaller approximation error than two-stage counterparts; (2) group-wise relative optimization, which leverages a reward model to construct intra-group relative reference policies, thereby enhancing list-level ranking fidelity; and (3) a scalable generative architecture coupled with user-feedback-driven reward modeling. Extensive experiments on public benchmarks and large-scale online A/B tests demonstrate significant improvements over state-of-the-art methods, validating the model’s robustness and effectiveness in both offline evaluation and live production environments.

Technology Category

Application Category

πŸ“ Abstract
Mainstream ranking approaches typically follow a Generator-Evaluator two-stage paradigm, where a generator produces candidate lists and an evaluator selects the best one. Recent work has attempted to enhance performance by expanding the number of candidate lists, for example, through multi-generator settings. However, ranking involves selecting a recommendation list from a combinatorially large space. Simply enlarging the candidate set remains ineffective, and performance gains quickly saturate. At the same time, recent advances in large recommendation models have shown that end-to-end one-stage models can achieve promising performance with the expectation of scaling laws. Motivated by this, we revisit ranking from a generator-only one-stage perspective. We theoretically prove that, for any (finite Multi-)Generator-Evaluator model, there always exists a generator-only model that achieves strictly smaller approximation error to the optimal ranking policy, while also enjoying scaling laws as its size increases. Building on this result, we derive an evidence upper bound of the one-stage optimization objective, from which we find that one can leverage a reward model trained on real user feedback to construct a reference policy in a group-relative manner. This reference policy serves as a practical surrogate of the optimal policy, enabling effective training of a large generator-only ranker. Based on these insights, we propose GoalRank, a generator-only ranking framework. Extensive offline experiments on public benchmarks and large-scale online A/B tests demonstrate that GoalRank consistently outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizing ranking models beyond candidate list expansion
Developing generator-only ranking with reduced approximation error
Leveraging user feedback for effective one-stage ranking training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generator-only one-stage ranking model
Group-relative reference policy optimization
Evidence upper bound objective training
πŸ”Ž Similar Papers
No similar papers found.
Kaike Zhang
Kaike Zhang
Institute of Computing Technology, Chinese Academy of Sciences
Trustworthy Graph Data Mining & Representation LearningRobust Recommender System
Xiaobei Wang
Xiaobei Wang
Kuaishou Technology
S
Shuchang Liu
Kuaishou Technology, Beijing, China
H
Hailan Yang
Kuaishou Technology, Beijing, China
X
Xiang Li
Kuaishou Technology, Beijing, China
Lantao Hu
Lantao Hu
Kuaishou Inc.
data miningrecommeder system
H
Han Li
Kuaishou Technology, Beijing, China
Q
Qi Cao
University of Chinese Academy of Sciences, Beijing, China
F
Fei Sun
University of Chinese Academy of Sciences, Beijing, China
Kun Gai
Kun Gai
Senior Director & Researcher, Alibaba Group
Machine LearningComputational Advertising