๐ค AI Summary
To address the limitation of conventional RAG methodsโwhich retrieve isolated text snippets while ignoring the inherent topological structure of networked documents (e.g., citation graphs, knowledge graphs)โthis paper proposes a graph-aware RAG framework. Methodologically: (1) we design a divide-and-conquer linear-time text subgraph retrieval algorithm enabling efficient subgraph-level retrieval; (2) we introduce a dual-path encoder jointly processing text and graph views to explicitly model structural relationships; and (3) we incorporate topology-aware prompt injection and multi-hop graph reasoning fine-tuning. Evaluated on multiple graph reasoning benchmarks, our approach significantly outperforms existing RAG methods, achieving a 21.4% absolute accuracy gain on complex 3+-hop reasoning tasks. To our knowledge, this is the first work to synergistically enhance generative outputs through joint optimization of textual semantics and graph topology.
๐ Abstract
Naive Retrieval-Augmented Generation (RAG) focuses on individual documents during retrieval and, as a result, falls short in handling networked documents which are very popular in many applications such as citation graphs, social media, and knowledge graphs. To overcome this limitation, we introduce Graph Retrieval-Augmented Generation (GRAG), which tackles the fundamental challenges in retrieving textual subgraphs and integrating the joint textual and topological information into Large Language Models (LLMs) to enhance its generation. To enable efficient textual subgraph retrieval, we propose a novel divide-and-conquer strategy that retrieves the optimal subgraph structure in linear time. To achieve graph context-aware generation, incorporate textual graphs into LLMs through two complementary views-the text view and the graph view-enabling LLMs to more effectively comprehend and utilize the graph context. Extensive experiments on graph reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods.