🤖 AI Summary
To address the challenge of varying query complexity in real-world question answering—where fixed-retrieval-augmented generation (RAG) architectures struggle to balance answer quality and computational cost—this paper proposes a multi-agent collaborative adaptive RAG framework. The framework employs a planner agent that dynamically orchestrates specialized executor agents (e.g., for query reformulation, document filtering, and answer generation) to construct customized, multi-step reasoning pipelines. Crucially, it integrates reinforcement learning with a reward function that maximizes F1 score (quality) while penalizing API call count and latency (cost), enabling joint optimization of accuracy and efficiency. Extensive experiments across multiple standard QA benchmarks demonstrate that our approach significantly outperforms baseline methods—including single-step and iterative RAG—achieving comparable or higher answer quality while reducing average API invocation cost by 23.6% and response latency by 18.4%.
📝 Abstract
In question-answering (QA) systems, Retrieval-Augmented Generation (RAG) has become pivotal in enhancing response accuracy and reducing hallucination issues. The architecture of RAG systems varies significantly, encompassing single-round RAG, iterative RAG, and reasoning RAG, each tailored to address different types of queries. Due to the varying complexity of real-world queries, a fixed RAG pipeline often struggles to balance performance and cost efficiency across different queries. To address this challenge, we propose an adaptive RAG framework called MAO-ARAG, which leverages multi-agent orchestration. Our adaptive RAG is conceived as a multi-turn framework. Specifically, we define multiple executor agents, representing typical RAG modules such as query reformulation agents, document selection agent, and generation agents. A planner agent intelligently selects and integrates the appropriate agents from these executors into a suitable workflow tailored for each query, striving for high-quality answers while maintaining reasonable costs. During each turn, the planner agent is trained using reinforcement learning, guided by an outcome-based reward (F1 score) and a cost-based penalty, continuously improving answer quality while keeping costs within a reasonable range. Experiments conducted on multiple QA datasets demonstrate that our approach, which dynamically plans workflows for each query, not only achieves high answer quality but also maintains both cost and latency within acceptable limits.The code of MAO-ARAG is on https://github.com/chenyiqun/Agentic-RAG.