🤖 AI Summary
Existing graph-structured RAG approaches suffer from three key limitations: decoupled retrieval and reasoning, poor scalability to multi-hop queries, and heavy reliance on annotated entities. This paper proposes a joint learning framework that unifies knowledge graph (KG) retrieval with large language models (LLMs) via end-to-end training, enabling adaptive multi-hop retrieval and reasoning co-optimization. Our core contributions are: (1) an attention-driven growth-and-pruning mechanism guided by LLM logits—providing implicit feedback without requiring ground-truth entity annotations—enabling open-domain, dynamic subgraph construction; and (2) a soft-token encoding scheme that structurally embeds graph information into the LLM, seamlessly integrating multi-hop retrieval, subgraph refinement, and joint backpropagation. Evaluated on three QA benchmarks, our method achieves state-of-the-art performance, with significant gains in complex multi-hop reasoning accuracy, demonstrating both the effectiveness and generalizability of the joint optimization paradigm.
📝 Abstract
Retrieval-Augmented Generation (RAG) has significantly mitigated the hallucinations of Large Language Models (LLMs) by grounding the generation with external knowledge. Recent extensions of RAG to graph-based retrieval offer a promising direction, leveraging the structural knowledge for multi-hop reasoning. However, existing graph RAG typically decouples retrieval and reasoning processes, which prevents the retriever from adapting to the reasoning needs of the LLM. They also struggle with scalability when performing multi-hop expansion over large-scale graphs, or depend heavily on annotated ground-truth entities, which are often unavailable in open-domain settings. To address these challenges, we propose a novel graph retriever trained end-to-end with LLM, which features an attention-based growing and pruning mechanism, adaptively navigating multi-hop relevant entities while filtering out noise. Within the extracted subgraph, structural knowledge and semantic features are encoded via soft tokens and the verbalized graph, respectively, which are infused into the LLM together, thereby enhancing its reasoning capability and facilitating interactive joint training of the graph retriever and the LLM reasoner. Experimental results across three QA benchmarks show that our approach consistently achieves state-of-the-art performance, validating the strength of joint graph-LLM optimization for complex reasoning tasks. Notably, our framework eliminates the need for predefined ground-truth entities by directly optimizing the retriever using LLM logits as implicit feedback, making it especially effective in open-domain settings.