The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This work addresses the limited reasoning capability of Graph-RAG in multi-hop question answering, which often prevents it from generating correct answers even when relevant context is available. To overcome this, the authors propose a structured reasoning augmentation framework that integrates SPARQL-based chain-of-thought prompting, knowledge graph traversal for context compression, and question-type routing. This approach substantially enhances the reasoning performance of small language models, achieving accuracy gains of 2–14 percentage points across three benchmarks, including HotpotQA. Notably, the method demonstrates strong cross-system transferability; after context compression, Llama-8B attains superior performance to the unaugmented Llama-70B while using only approximately one-twelfth of the computational cost.

Technology Category

Application Category

📝 Abstract

Graph-RAG systems achieve strong multi-hop question answering by indexing documents into knowledge graphs, but strong retrieval does not guarantee strong answers. Evaluating KET-RAG, a leading Graph-RAG system, on three multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA), we find that 77% to 91% of questions have the gold answer in the retrieved context, yet accuracy is only 35% to 78%, and 73% to 84% of errors are reasoning failures. We propose two augmentations: (i) SPARQL chain-of-thought prompting, which decomposes questions into triple-pattern queries aligned with the entity-relationship context, and (ii) graph-walk compression, which compresses the context by ~60% via knowledge-graph traversal with no LLM calls. SPARQL CoT improves accuracy by +2 to +14 pp; graph-walk compression adds +6 pp on average when paired with structured prompting on smaller models. Surprisingly, we show that, with question-type routing, a fully augmented budget open-weight Llama-8B model matches or exceeds the unaugmented Llama-70B baseline on all three benchmarks at ~12x lower cost. A replication on LightRAG confirms that our augmentations transfer across Graph-RAG systems.

Problem

Research questions and friction points this paper is trying to address.

Graph-RAG

multi-hop QA

reasoning bottleneck

context retrieval

answer accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-RAG

SPARQL chain-of-thought

graph-walk compression