DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing retrieval-augmented generation methods perform well on knowledge-intensive tasks but struggle with complex queries requiring abstract reasoning, analogy, or long-range logical inference. This paper introduces a multi-stage framework tailored for reasoning-intensive information retrieval. Our approach comprises three core components: (1) a large language model–driven iterative query expansion that explicitly models reasoning intent; (2) a reasoning-enhanced retriever fine-tuned on synthetically generated multi-domain data augmented with hard negative samples; and (3) a pointwise re-ranking mechanism integrating LLM-based utility scoring. This end-to-end, reasoning-aware pipeline significantly improves retrieval relevance. On the BRIGHT benchmark, it achieves nDCG@10 scores of 41.6 and 28.9—setting new state-of-the-art results—and demonstrates robust effectiveness in realistic, complex querying scenarios.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation has achieved strong performance on knowledge-intensive tasks where query-document relevance can be identified through direct lexical or semantic matches. However, many real-world queries involve abstract reasoning, analogical thinking, or multi-step inference, which existing retrievers often struggle to capture. To address this challenge, we present extbf{DIVER}, a retrieval pipeline tailored for reasoning-intensive information retrieval. DIVER consists of four components: document processing to improve input quality, LLM-driven query expansion via iterative document interaction, a reasoning-enhanced retriever fine-tuned on synthetic multi-domain data with hard negatives, and a pointwise reranker that combines LLM-assigned helpfulness scores with retrieval scores. On the BRIGHT benchmark, DIVER achieves state-of-the-art nDCG@10 scores of 41.6 and 28.9 on original queries, consistently outperforming competitive reasoning-aware models. These results demonstrate the effectiveness of reasoning-aware retrieval strategies in complex real-world tasks. Our code and retrieval model will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Addresses retrieval challenges in abstract reasoning queries
Enhances reasoning-intensive information retrieval performance
Improves multi-step inference and analogical thinking in retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven query expansion via document interaction
Reasoning-enhanced retriever with synthetic data
Pointwise reranker combining LLM scores
🔎 Similar Papers
No similar papers found.
Meixiu Long
Meixiu Long
Sun Yat-sen University
Graph representation learningSocial Network MiningInformation fusion
D
Duolin Sun
Ant Group, Hangzhou, China
D
Dan Yang
Ant Group, Hangzhou, China
J
Junjie Wang
Ant Group, Hangzhou, China
Y
Yue Shen
Ant Group, Hangzhou, China
J
Jian Wang
Ant Group, Hangzhou, China
P
Peng Wei
Ant Group, Hangzhou, China
Jinjie Gu
Jinjie Gu
ant group
机器学习,推荐
J
Jiahai Wang
Sun Yat-sen University