DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing retrieval-augmented generation methods perform well on knowledge-intensive tasks but struggle with complex queries requiring abstract reasoning, analogy, or long-range logical inference. This paper introduces a multi-stage framework tailored for reasoning-intensive information retrieval. Our approach comprises three core components: (1) a large language model–driven iterative query expansion that explicitly models reasoning intent; (2) a reasoning-enhanced retriever fine-tuned on synthetically generated multi-domain data augmented with hard negative samples; and (3) a pointwise re-ranking mechanism integrating LLM-based utility scoring. This end-to-end, reasoning-aware pipeline significantly improves retrieval relevance. On the BRIGHT benchmark, it achieves nDCG@10 scores of 41.6 and 28.9—setting new state-of-the-art results—and demonstrates robust effectiveness in realistic, complex querying scenarios.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation has achieved strong performance on knowledge-intensive tasks where query-document relevance can be identified through direct lexical or semantic matches. However, many real-world queries involve abstract reasoning, analogical thinking, or multi-step inference, which existing retrievers often struggle to capture. To address this challenge, we present extbf{DIVER}, a retrieval pipeline tailored for reasoning-intensive information retrieval. DIVER consists of four components: document processing to improve input quality, LLM-driven query expansion via iterative document interaction, a reasoning-enhanced retriever fine-tuned on synthetic multi-domain data with hard negatives, and a pointwise reranker that combines LLM-assigned helpfulness scores with retrieval scores. On the BRIGHT benchmark, DIVER achieves state-of-the-art nDCG@10 scores of 41.6 and 28.9 on original queries, consistently outperforming competitive reasoning-aware models. These results demonstrate the effectiveness of reasoning-aware retrieval strategies in complex real-world tasks. Our code and retrieval model will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Addresses retrieval challenges in abstract reasoning queries

Enhances reasoning-intensive information retrieval performance

Improves multi-step inference and analogical thinking in retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven query expansion via document interaction

Reasoning-enhanced retriever with synthetic data

Pointwise reranker combining LLM scores

🔎 Similar Papers

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval