๐ค AI Summary
RAG systems suffer from inefficient collaboration between retrievers and large language models (LLMs), leading to suboptimal performance and poor alignment. To address this, we propose a lightweight, plug-and-play multi-agent agent framework that requires no modifications to existing retrievers or LLMs. Our method centers on two key innovations: (1) an agent-centric architecture that emulates human iterative query-refinement and result-review behavior; and (2) a tree-structured rollout mechanism grounded in reinforcement learning, enabling end-to-end joint optimization of retrieval intent classification, query rewriting, and result filtering via fine-grained credit assignment. Evaluated on both in-domain and out-of-distribution RAG tasks, our approach achieves significant performance gains while preserving strong generalization and component agnosticismโi.e., compatibility with arbitrary off-the-shelf retrievers and LLMs.
๐ Abstract
Retrieval-augmented generation (RAG) systems face a fundamental challenge in aligning independently developed retrievers and large language models (LLMs). Existing approaches typically involve modifying either component or introducing simple intermediate modules, resulting in practical limitations and sub-optimal performance. Inspired by human search behavior -- typically involving a back-and-forth process of proposing search queries and reviewing documents, we propose C-3PO, a proxy-centric framework that facilitates communication between retrievers and LLMs through a lightweight multi-agent system. Our framework implements three specialized agents that collaboratively optimize the entire RAG pipeline without altering the retriever and LLMs. These agents work together to assess the need for retrieval, generate effective queries, and select information suitable for the LLMs. To enable effective multi-agent coordination, we develop a tree-structured rollout approach for reward credit assignment in reinforcement learning. Extensive experiments in both in-domain and out-of-distribution scenarios demonstrate that C-3PO significantly enhances RAG performance while maintaining plug-and-play flexibility and superior generalization capabilities.