What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

Existing KG-RAG methods exhibit weak reasoning capabilities under knowledge graph (KG) incompleteness, often resorting to parametric memory or direct retrieval rather than genuine multi-hop inference; meanwhile, current evaluation paradigms fail to rigorously assess true reasoning—suffering from inconsistent benchmarks, lenient answer matching, and no explicit modeling of knowledge gaps. Method: We propose the first KG-RAG reasoning evaluation framework tailored to KG incompleteness, featuring a controllable knowledge-missing simulation mechanism, strict semantic alignment for answer matching, and a reproducible benchmark construction protocol. Contribution/Results: Experiments reveal substantial performance degradation of mainstream KG-RAG methods on multi-hop reasoning tasks, exposing architectural limitations and poor generalization. Our framework uncovers fundamental bottlenecks in KG-RAG reasoning and establishes a standardized, trustworthy evaluation foundation for future research on reliable KG-augmented reasoning.

Technology Category

Application Category

📝 Abstract

Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs. However, current evaluation practices fall short: existing benchmarks often include questions that can be directly answered using existing triples in KG, making it unclear whether models perform reasoning or simply retrieve answers directly. Moreover, inconsistent evaluation metrics and lenient answer matching criteria further obscure meaningful comparisons. In this work, we introduce a general method for constructing benchmarks, together with an evaluation protocol, to systematically assess KG-RAG methods under knowledge incompleteness. Our empirical results show that current KG-RAG methods have limited reasoning ability under missing knowledge, often rely on internal memorization, and exhibit varying degrees of generalization depending on their design.

Problem

Research questions and friction points this paper is trying to address.

Evaluating KG-RAG reasoning under incomplete knowledge

Assessing model reliance on retrieval vs true reasoning

Standardizing benchmarks for meaningful KG-RAG comparisons

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs benchmarks for KG-RAG evaluation

Assesses reasoning under incomplete knowledge

Reveals reliance on internal memorization

🔎 Similar Papers

Dual Reasoning: A GNN-LLM Collaborative Framework for Knowledge Graph Question Answering