RECOR: Reasoning-focused Multi-turn Conversational Retrieval Benchmark

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the gap in existing benchmarks that treat multi-turn dialogue and reasoning-intensive retrieval as disjoint tasks, failing to reflect real-world scenarios where both are intertwined. We propose the first reasoning-oriented multi-turn conversational retrieval benchmark, comprising 707 dialogues (2,971 turns) across 11 domains. Complex queries are decomposed into fact-based multi-turn interactions through a decomposition-and-verification framework, with explicit reasoning annotations generated for each turn. The benchmark introduces a novel multi-level verification mechanism and fact provenance to enable fine-grained evaluation. Experiments show that incorporating dialogue history and explicit reasoning doubles retrieval effectiveness (nDCG@10 improves from 0.236 to 0.479), and reasoning-specialized models substantially outperform dense encoders, though implicit reasoning remains a key challenge.

Technology Category

Application Category

📝 Abstract
Existing benchmarks treat multi-turn conversation and reasoning-intensive retrieval separately, yet real-world information seeking requires both. To bridge this gap, we present a benchmark for reasoning-based conversational information retrieval comprising 707 conversations (2,971 turns) across eleven domains. To ensure quality, our Decomposition-and-Verification framework transforms complex queries into fact-grounded multi-turn dialogues through multi-level validation, where atomic facts are verified against sources and explicit retrieval reasoning is generated for each turn. Comprehensive evaluation reveals that combining conversation history with reasoning doubles retrieval performance (Baseline .236 $\rightarrow$ History+Reasoning .479 nDCG@10), while reasoning-specialized models substantially outperform dense encoders. Despite these gains, further analysis highlights that implicit reasoning remains challenging, particularly when logical connections are not explicitly stated in the text.
Problem

Research questions and friction points this paper is trying to address.

conversational retrieval
reasoning
multi-turn dialogue
information seeking
retrieval benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning-focused retrieval
multi-turn conversational retrieval
Decomposition-and-Verification framework
explicit retrieval reasoning
conversational benchmark
🔎 Similar Papers
No similar papers found.