🤖 AI Summary
This work addresses the limitations of traditional Retrieval-Augmented Generation (RAG) systems, which organize evidence in a flat manner and struggle to effectively manage retrieval states—such as search trajectories, entity relationships, and intermediate reasoning artifacts—thereby hindering cross-task reasoning efficiency and accuracy. The paper formalizes RAG as a structured retrieval state management problem and introduces three core mechanisms: a Typed Hierarchical State Space (TAM), Multi-Agent Role-based Synergy verification (MARS), and a State-aware Memory Pool (SMP). Experimental results demonstrate that the proposed approach achieves state-of-the-art question-answering performance on three LongBench subsets, matches the strongest agent-based baseline in EM score on HotpotQA while reducing large model token consumption by 3.51×, and enables low-overhead inference on DocVQA with cross-query cache hit rates ranging from 3.77% to 23.18%.
📝 Abstract
Retrieval-augmented generation (RAG) has become the standard way to ground large language models in external knowledge, but many systems still organize evidence as flat chunks and retrieve it through largely unstructured search. This weak structure becomes a bottleneck for complex retrieval: the system must decide where to search, how to move from coarse topics to entity-relation evidence, which evidence has been verified, and which intermediate artifacts can be reused. We define these intermediate variables as a retrieval state and study RAG as structured state management. EfficientGraph-RAG makes this state explicit through three coupled mechanisms: TAM defines a typed hierarchical state space over evidence, MARS updates and verifies the state through role-specialized agents, and SMP stores reusable state under hierarchy-aware access control. Using one shared framework configuration, EfficientGraph-RAG ranks first on the reported answer-quality metrics averaged over the three evaluated LongBench retrieval-style subsets, matches the strongest agentic baseline on HotpotQA EM while reducing large-model token usage by $3.51\times$, and provides a low-token DocVQA result among retrieval-organizing cross-modal methods. Component analysis shows role-specific mechanisms: MARS is the main answer-quality driver, TAM supplies the typed traversal state and Adaptive Routing signal, and SMP enables corpus-dependent reuse, with cross-query cache hit rates ranging from 3.77% to 23.18%.