🤖 AI Summary
This work addresses the challenge of performing efficient cross-domain retrieval with Graph-RAG in distributed, access-constrained environments—such as hospitals or multinational organizations—where centralized knowledge graphs are infeasible. To overcome this limitation, the authors propose SCOUT-RAG, a novel framework that enables effective distributed Graph-RAG without requiring a global graph view. SCOUT-RAG employs four coordinated agents that dynamically assess domain relevance, decide whether to expand into new domains, adaptively adjust graph traversal depth, and jointly optimize retrieval and answer generation. Experimental results demonstrate that SCOUT-RAG achieves performance comparable to centralized baselines like DRIFT and exhaustive search methods in multi-domain knowledge scenarios, while substantially reducing the number of cross-domain API calls, total processed tokens, and response latency.
📝 Abstract
Graph-RAG improves LLM reasoning using structured knowledge, yet conventional designs rely on a centralized knowledge graph. In distributed and access-restricted settings (e.g., hospitals or multinational organizations), retrieval must select relevant domains and appropriate traversal depth without global graph visibility or exhaustive querying. To address this challenge, we introduce \textbf{SCOUT-RAG} (\textit{\underline{S}calable and \underline{CO}st-efficient \underline{U}nifying \underline{T}raversal}), a distributed agentic Graph-RAG framework that performs progressive cross-domain retrieval guided by incremental utility goals. SCOUT-RAG employs four cooperative agents that: (i) estimate domain relevance, (ii) decide when to expand retrieval to additional domains, (iii) adapt traversal depth to avoid unnecessary graph exploration, and (iv) synthesize the high-quality answers. The framework is designed to minimize retrieval regret, defined as missing useful domain information, while controlling latency and API cost. Across multi-domain knowledge settings, SCOUT-RAG achieves performance comparable to centralized baselines, including DRIFT and exhaustive domain traversal, while substantially reducing cross-domain calls, total tokens processed, and latency.