🤖 AI Summary
Current retrieval-augmented generation (RAG) systems struggle to leverage the growing reasoning and tool-use capabilities of large language models (LLMs) due to the absence of model involvement in the retrieval process. To address this limitation, this work proposes A-RAG, a novel framework that delegates retrieval decisions back to the LLM. A-RAG introduces a hierarchical interface offering three composable tools—keyword search, semantic search, and fine-grained text reading—enabling the model to autonomously plan and dynamically invoke retrieval strategies in an agent-driven, adaptive manner. Experimental results demonstrate that A-RAG significantly outperforms existing methods on multiple open-domain question answering benchmarks, achieving higher accuracy with the same or fewer retrieval tokens. Moreover, the framework exhibits strong scalability with respect to both model size and test-time computational resources.
📝 Abstract
Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage these capabilities. They still rely on two paradigms: (1) designing an algorithm that retrieves passages in a single shot and concatenates them into the model's input, or (2) predefining a workflow and prompting the model to execute it step-by-step. Neither paradigm allows the model to participate in retrieval decisions, preventing efficient scaling with model improvements. In this paper, we introduce A-RAG, an Agentic RAG framework that exposes hierarchical retrieval interfaces directly to the model. A-RAG provides three retrieval tools: keyword search, semantic search, and chunk read, enabling the agent to adaptively search and retrieve information across multiple granularities. Experiments on multiple open-domain QA benchmarks show that A-RAG consistently outperforms existing approaches with comparable or lower retrieved tokens, demonstrating that A-RAG effectively leverages model capabilities and dynamically adapts to different RAG tasks. We further systematically study how A-RAG scales with model size and test-time compute. We will release our code and evaluation suite to facilitate future research. Code and evaluation suite are available at https://github.com/Ayanami0730/arag.