🤖 AI Summary
This work addresses the inefficiencies in graph-based retrieval-augmented generation caused by mixed-difficulty queries—leading to wasted resources and failed reasoning—as well as the loss of fine-grained information during graph abstraction. To tackle these issues, the authors propose an adaptive, agent-cooperative graph retrieval framework. The framework employs an adaptive controller that dynamically assesses the sufficiency of retrieved evidence and triggers refined retrieval only when necessary. It further integrates an agent-based retriever to intensify retrieval efforts and traces graph signals back to their original textual sources to preserve critical details. Evaluated on HotpotQA and 2WikiMultiHopQA, the approach improves Recall@2 by 9.9 and 11.8 percentage points, respectively, while reducing token consumption and end-to-end latency by approximately 50%, thereby achieving both high reasoning reliability and significant efficiency gains.
📝 Abstract
Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlenecks: (i) mixed-difficulty workloads where one-size-fits-all retrieval either wastes cost on easy queries or fails on hard multihop cases, and (ii) extraction loss, where graph abstraction omits fine-grained qualifiers that remain only in source text. We present A2RAG, an adaptive-and-agentic GraphRAG framework for cost-aware and reliable reasoning. A2RAG couples an adaptive controller that verifies evidence sufficiency and triggers targeted refinement only when necessary, with an agentic retriever that progressively escalates retrieval effort and maps graph signals back to provenance text to remain robust under extraction loss and incomplete graphs. Experiments on HotpotQA and 2WikiMultiHopQA demonstrate that A2RAG achieves +9.9/+11.8 absolute gains in Recall@2, while cutting token consumption and end-to-end latency by about 50% relative to iterative multihop baselines.