🤖 AI Summary
This work addresses the limitation of existing agent-based retrieval systems, which predominantly rely on similarity-based retrieval and often fail to ensure the utility of retrieved passages for multi-hop reasoning. The authors propose a utility-aware retriever training framework that, for the first time, incorporates global answer correctness into the training objective, jointly optimizing local query-passage relevance and global reasoning utility. They further introduce a bidirectional iterative mechanism between the agent and the retriever to enable continuous improvement of retrieval capability. By transcending the constraints of conventional single-turn retrieval-augmented generation (RAG), the method achieves significant performance gains over strong baselines across seven single-hop and multi-hop question answering benchmarks, demonstrating its effectiveness and generalizability across diverse agent architectures.
📝 Abstract
Agentic search has recently emerged as a powerful paradigm, where an agent interleaves multi-step reasoning with on-demand retrieval to solve complex questions. Despite its success, how to design a retriever for agentic search remains largely underexplored. Existing search agents typically rely on similarity-based retrievers, while similar passages are not always useful for final answer generation. In this paper, we propose a novel retriever training framework tailored for agentic search. Unlike retrievers designed for single-turn retrieval-augmented generation (RAG) that only rely on local passage utility, we propose to use both local query-passage relevance and global answer correctness to measure passage utility in a multi-turn agentic search. We further introduce an iterative training strategy, where the search agent and the retriever are optimized bidirectionally and iteratively. Different from RAG retrievers that are only trained once with fixed questions, our retriever is continuously improved using evolving and higher-quality queries from the agent. Extensive experiments on seven single-hop and multi-hop QA benchmarks demonstrate that our retriever, termed \ours{}, consistently outperforms strong baselines across different search agents. Our codes are available at: https://github.com/8421BCD/Agentic-R.