ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing LLM-based re-ranking models underperform in complex ranking scenarios due to insufficient reasoning-intensive training data. Method: We propose ReasonRank—a novel framework integrating automated reasoning data synthesis, self-consistency filtering, and multi-view ranking reward reinforcement learning. Specifically, DeepSeek-R1 is leveraged to generate high-quality reasoning-aware ranking labels; the model is then trained via two stages—supervised fine-tuning followed by reinforcement learning with a multi-view ranking reward that jointly optimizes ranking consistency and logical coherence. Contribution/Results: ReasonRank achieves a new state-of-the-art score of 40.6 on the BRIGHT leaderboard, significantly outperforming prior methods, while maintaining lower inference latency than the pointwise model Rank1. Its core contribution is the first reasoning-enhanced training paradigm specifically designed for paragraph re-ranking, systematically improving both logical reasoning capability and ranking coordination.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models, many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. A self-consistency data filtering mechanism is designed to ensure the data quality. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage post-training approach, which includes a cold-start supervised fine-tuning (SFT) stage for reasoning pattern learning and a reinforcement learning (RL) stage for further ranking ability enhancement. During the RL stage, based on the nature of listwise ranking, we design a multi-view ranking reward, which is more effective than a ranking metric-based reward. Extensive experiments demonstrate that our trained reasoning-intensive reranker extbf{ReasonRank} outperforms existing baselines significantly and also achieves much lower latency than pointwise reranker Rank1. extbf{Through further experiments, our ReasonRank has achieved state-of-the-art (SOTA) performance 40.6 on the BRIGHT leaderboardfootnote{https://brightbenchmark.github.io/}.} Our codes are available at https://github.com/8421BCD/ReasonRank.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of reasoning-intensive training data for rerankers

Improves reasoning ability in complex passage ranking scenarios

Reduces latency while enhancing ranking performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated reasoning-intensive training data synthesis

Two-stage post-training with SFT and RL

Multi-view ranking reward for RL enhancement

🔎 Similar Papers

ReasoningRank: Teaching Student Models to Rank through Reasoning-Based Knowledge Distillation