🤖 AI Summary
This work investigates whether explicit stepwise reasoning enhances the performance of LLM-based passage re-ranking. Method: Under identical training conditions, we compare ReasonRR—a pointwise re-ranker incorporating explicit chain-of-thought reasoning—with StandardRR, a non-reasoning baseline. We further conduct ablation studies by disabling reasoning in ReasonRR (yielding ReasonRR-NoReason). Contribution/Results: Contrary to expectations, StandardRR significantly outperforms ReasonRR; remarkably, ReasonRR-NoReason even surpasses ReasonRR. Analysis reveals that explicit reasoning induces score polarization in relevance estimation, impairing modeling of partial relevance. To our knowledge, this is the first systematic study exposing the detrimental effect of explicit reasoning in re-ranking. We propose the “reasoning-free” paradigm and empirically validate its superiority, challenging the prevailing assumption that chain-of-thought reasoning is inherently beneficial for LLM-based re-ranking. Our findings open new avenues for designing lightweight, robust re-rankers without reliance on explicit reasoning.
📝 Abstract
With the growing success of reasoning models across complex natural language tasks, researchers in the Information Retrieval (IR) community have begun exploring how similar reasoning capabilities can be integrated into passage rerankers built on Large Language Models (LLMs). These methods typically employ an LLM to produce an explicit, step-by-step reasoning process before arriving at a final relevance prediction. But, does reasoning actually improve reranking accuracy? In this paper, we dive deeper into this question, studying the impact of the reasoning process by comparing reasoning-based pointwise rerankers (ReasonRR) to standard, non-reasoning pointwise rerankers (StandardRR) under identical training conditions, and observe that StandardRR generally outperforms ReasonRR. Building on this observation, we then study the importance of reasoning to ReasonRR by disabling its reasoning process (ReasonRR-NoReason), and find that ReasonRR-NoReason is surprisingly more effective than ReasonRR. Examining the cause of this result, our findings reveal that reasoning-based rerankers are limited by the LLM's reasoning process, which pushes it toward polarized relevance scores and thus fails to consider the partial relevance of passages, a key factor for the accuracy of pointwise rerankers.