Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of retrieving implicitly relevant documents for complex tasks—where task-document semantic relationships are indirect and require fine-grained reasoning—this paper proposes Retro*. Its core contributions are threefold: (1) a rubric-based, interpretable relevance assessment mechanism that grounds relevance judgments in explicit evaluation criteria; (2) a multi-reasoning-path fusion strategy leveraging test-time trajectory ensembling to enhance retrieval robustness; and (3) a dual-composite reward reinforcement learning algorithm that jointly optimizes both reasoning processes and final retrieval outcomes. Evaluated on the BRIGHT benchmark, Retro* substantially outperforms state-of-the-art information retrieval (IR) and retrieval-augmented generation (RAG) methods, achieving new SOTA performance. It simultaneously delivers high accuracy, strong interpretability, computational efficiency, and favorable scalability.

Technology Category

Application Category

📝 Abstract
With the growing popularity of LLM agents and RAG, it has become increasingly important to retrieve documents that are essential for solving a task, even when their connection to the task is indirect or implicit. Addressing this problem requires fine-grained reasoning to accurately assess the relevance between the task and each candidate document. This capability, however, poses a significant challenge for existing IR techniques. Despite recent progress in reasoning-enhanced IR, existing approaches still face significant challenges in applicability, scalability, and efficiency. In this work, we propose Retro*, a novel approach for reasoning-intensive document retrieval. Our method introduces a rubric-based relevance scoring mechanism, enabling the model to reason about the relationship between a task and a document based on explicitly defined criteria, whereby producing a fine-grained, interpretable relevance score. Retro* also supports test-time scaling by combining multiple reasoning trajectories via score integration, which produces more reliable relevance estimates. To optimize Retro*'s reasoning capabilities, we introduce a novel reinforcement learning algorithm tailored for its relevance scoring mechanism, which employs two composite rewards to fully exploit the trajectories of each training sample. Our experiments show that Retro* outperforms existing document retrieval methods with notable advantages, leading to state-of-the-art performance on the BRIGHT benchmark.
Problem

Research questions and friction points this paper is trying to address.

Optimizing reasoning-intensive document retrieval for LLM agents
Addressing scalability and efficiency challenges in relevance assessment
Improving fine-grained interpretable scoring for indirect document relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rubric-based scoring for fine-grained relevance reasoning
Test-time scaling via multi-trajectory score integration
Reinforcement learning with composite rewards for optimization
J
Junwei Lan
University of Science and Technology of China
Jianlyu Chen
Jianlyu Chen
University of Science and Technology of China
Natural Language ProcessingInformation Retrieval
Z
Zheng Liu
Hong Kong Polytechnic University
Chaofan Li
Chaofan Li
Beijing University of Posts and Telecommunications
NLP
Siqi Bao
Siqi Bao
Baidu
Natural Language ProcessingMedical Image Analysis
D
Defu Lian
University of Science and Technology of China