RAG vs. GraphRAG: A Systematic Evaluation and Key Insights

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

This work addresses the lack of systematic evaluation and comparative analysis between RAG and GraphRAG on general-purpose text benchmarks. For the first time, it conducts a multi-dimensional performance comparison across a unified text benchmark encompassing question answering and query-focused summarization. Methodologically, it establishes a standardized evaluation framework to analyze differences in retrieval quality, generation consistency, and structural utilization efficiency, and proposes a complementary RAG–GraphRAG fusion strategy. Key contributions include: (1) clarifying the applicability boundaries—RAG excels at local semantic matching, whereas GraphRAG shows promise for long-range reasoning but is hampered by deficiencies in textual graph structure modeling; (2) achieving average accuracy improvements of 10–18% across multiple tasks with the fused model; and (3) providing empirical evidence and methodological guidance for optimizing GraphRAG’s structural design in purely textual settings.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) enhances the performance of LLMs across various tasks by retrieving relevant information from external sources, particularly on text-based data. For structured data, such as knowledge graphs, GraphRAG has been widely used to retrieve relevant information. However, recent studies have revealed that structuring implicit knowledge from text into graphs can benefit certain tasks, extending the application of GraphRAG from graph data to general text-based data. Despite their successful extensions, most applications of GraphRAG for text data have been designed for specific tasks and datasets, lacking a systematic evaluation and comparison between RAG and GraphRAG on widely used text-based benchmarks. In this paper, we systematically evaluate RAG and GraphRAG on well-established benchmark tasks, such as Question Answering and Query-based Summarization. Our results highlight the distinct strengths of RAG and GraphRAG across different tasks and evaluation perspectives. Inspired by these observations, we investigate strategies to integrate their strengths to improve downstream tasks. Additionally, we provide an in-depth discussion of the shortcomings of current GraphRAG approaches and outline directions for future research.

Problem

Research questions and friction points this paper is trying to address.

Comparison of RAG and GraphRAG

Systematic evaluation on benchmarks

Integration strategies for enhanced performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG enhances LLMs with external data

GraphRAG extends to text-based tasks

Systematic evaluation improves downstream tasks

🔎 Similar Papers

No similar papers found.