🤖 AI Summary
Existing RAG systems suffer from answer fragmentation and poor modeling of complex inter-entity dependencies due to flat indexing structures and insufficient context awareness. To address these limitations, we propose GraphRAG—a lightweight, graph-structured RAG framework. Its core contributions are: (1) a two-tiered graph retrieval mechanism enabling fine-grained entity-relation matching and cross-level knowledge co-discovery; (2) dual-granularity retrieval integrating graph-based structural indexing with vector-based semantic representation; and (3) an incremental graph update algorithm supporting real-time knowledge evolution. Evaluated across multiple benchmarks, GraphRAG achieves significant improvements over state-of-the-art RAG methods—boosting mean reciprocal rank (MRR) by +12.7% and reducing latency by 38%. The framework’s source code and pre-trained models are publicly released and have been deployed in production environments.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail to capture complex inter-dependencies. To address these challenges, we propose LightRAG, which incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from both low-level and high-level knowledge discovery. Additionally, the integration of graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance. This capability is further enhanced by an incremental update algorithm that ensures the timely integration of new data, allowing the system to remain effective and responsive in rapidly changing data environments. Extensive experimental validation demonstrates considerable improvements in retrieval accuracy and efficiency compared to existing approaches. We have made our LightRAG open-source and available at the link: https://github.com/HKUDS/LightRAG