GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing re-ranking methods face a fundamental trade-off: pointwise approaches suffer from “ranking myopia” due to independent document scoring, neglecting inter-document relevance relationships; listwise methods, while globally aware, are constrained by “list rigidity,” limiting scalability to large candidate sets. To address this, we propose the groupwise re-ranking paradigm, which performs fine-grained contrastive learning over document groups—preserving flexibility while explicitly modeling relative relevance. We introduce a novel heterogeneous reward function that jointly optimizes ranking metrics and distribution alignment objectives. Built upon the GRPO framework, our method establishes an end-to-end retrieval–re-ranking joint training pipeline and synthesizes high-quality training data. Evaluated on two reasoning-intensive benchmarks—BRIGHT and R2MED—our approach achieves significant improvements over state-of-the-art re-rankers.

Technology Category

Application Category

📝 Abstract
Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practical dilemma: Pointwise methods, while simple and highly flexible, evaluate documents independently, making them prone to the Ranking Myopia Trap, overlooking the relative importance between documents. In contrast, Listwise methods can perceive the global ranking context, but suffer from inherent List Rigidity, leading to severe scalability and flexibility issues when handling large candidate sets. To address these challenges, we propose Groupwise, a novel reranking paradigm. In this approach, the query and a group of candidate documents are jointly fed into the model, which performs within-group comparisons to assign individual relevance scores to each document. This design retains the flexibility of Pointwise methods while enabling the comparative capability of Listwise methods. We further adopt GRPO for model training, equipped with a heterogeneous reward function that integrates ranking metrics with a distributional reward aimed at aligning score distributions across groups. To overcome the bottleneck caused by the scarcity of high quality labeled data, we further propose an innovative pipeline for synthesizing high quality retrieval and ranking data. The resulting data can be leveraged not only for training the reranker but also for training the retriever. Extensive experiments validate the effectiveness of our approach. On two reasoning intensive retrieval benchmarks, BRIGHT and R2MED.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of pointwise and listwise document reranking methods
Proposes groupwise paradigm for flexible yet comparative relevance scoring
Solves data scarcity through synthetic retrieval and ranking data generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Groupwise reranking paradigm using within-group comparisons
GRPO training with heterogeneous ranking and distribution rewards
Synthesizing high-quality retrieval and ranking data pipeline
🔎 Similar Papers
No similar papers found.
D
Duolin Sun
Ant Group
Meixiu Long
Meixiu Long
Sun Yat-sen University
Graph representation learningSocial Network MiningInformation fusion
D
Dan Yang
Ant Group
Y
Yihan Jiao
Ant Group
Y
Yue Shen
Ant Group
Z
Zhehao Tan
Ant Group
J
Jie Feng
Ant Group
J
Junjie Wang
Ant Group
P
Peng Wei
Ant Group
J
Jian Wang
Ant Group
Jinjie Gu
Jinjie Gu
ant group
机器学习,推荐