🤖 AI Summary
To address the weak interpretability and difficulty in verifying factual grounding of Retrieval-Augmented Generation (RAG) systems when retrieving from unstructured text, this paper proposes KG-RAG: a knowledge graph–enhanced RAG framework. First, it constructs a domain-specific knowledge graph (KG) via prompt-driven information extraction; then, semantic paths within the KG are converted into pseudo-paragraphs to improve retrieval relevance. Furthermore, it introduces a KG structure–based attribution method that quantifies the influence of node types and topological features—such as centrality and path length—on answer generation, thereby rendering reasoning paths transparent. Experiments demonstrate significant improvements in factual accuracy and explanation consistency of generated answers. Notably, this work is the first to empirically reveal statistically significant correlations between KG structural metrics and explanation importance, establishing a structured, quantifiable methodology for interpretable RAG.
📝 Abstract
Retrieval-Augmented Generation (RAG) enhances language models by grounding responses in external information, yet explainability remains a critical challenge, particularly when retrieval relies on unstructured text. Knowledge graphs (KGs) offer a solution by introducing structured, semantically rich representations of entities and their relationships, enabling transparent retrieval paths and interpretable reasoning. In this work, we present KGRAG-Ex, a RAG system that improves both factual grounding and explainability by leveraging a domain-specific KG constructed via prompt-based information extraction. Given a user query, KGRAG-Ex identifies relevant entities and semantic paths in the graph, which are then transformed into pseudo-paragraphs: natural language representations of graph substructures that guide corpus retrieval. To improve interpretability and support reasoning transparency, we incorporate perturbation-based explanation methods that assess the influence of specific KG-derived components on the generated answers. We conduct a series of experiments to analyze the sensitivity of the system to different perturbation methods, the relationship between graph component importance and their structural positions, the influence of semantic node types, and how graph metrics correspond to the influence of components within the explanations process.