LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing knowledge graph–based RAG methods suffer from three key limitations: (1) incomplete or erroneous retrieved contexts; (2) isolated high-level semantic abstractions lacking explicit inter-concept relationships—forming “semantic islands”; and (3) underutilization of graph topology, resulting in inefficient retrieval. To address these, we propose a Semantic Aggregation Network coupled with a Hierarchical Structure-Aware Retrieval framework. First, we design a semantic aggregation algorithm that explicitly models relational dependencies among high-level concepts, constructing a navigable semantic network. Second, we introduce a bottom-up hierarchical retrieval strategy integrating entity clustering with path-guided fine-grained anchoring, enabling coherent cross-community reasoning and precise context acquisition. Evaluated on four cross-domain QA benchmarks, our method significantly outperforms state-of-the-art approaches: it improves response quality and reduces retrieval redundancy by 46%.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) plays a crucial role in grounding Large Language Models by leveraging external knowledge, whereas the effectiveness is often compromised by the retrieval of contextually flawed or incomplete information. To address this, knowledge graph-based RAG methods have evolved towards hierarchical structures, organizing knowledge into multi-level summaries. However, these approaches still suffer from two critical, unaddressed challenges: high-level conceptual summaries exist as disconnected ``semantic islands'', lacking the explicit relations needed for cross-community reasoning; and the retrieval process itself remains structurally unaware, often degenerating into an inefficient flat search that fails to exploit the graph's rich topology. To overcome these limitations, we introduce LeanRAG, a framework that features a deeply collaborative design combining knowledge aggregation and retrieval strategies. LeanRAG first employs a novel semantic aggregation algorithm that forms entity clusters and constructs new explicit relations among aggregation-level summaries, creating a fully navigable semantic network. Then, a bottom-up, structure-guided retrieval strategy anchors queries to the most relevant fine-grained entities and then systematically traverses the graph's semantic pathways to gather concise yet contextually comprehensive evidence sets. The LeanRAG can mitigate the substantial overhead associated with path retrieval on graphs and minimizes redundant information retrieval. Extensive experiments on four challenging QA benchmarks with different domains demonstrate that LeanRAG significantly outperforming existing methods in response quality while reducing 46% retrieval redundancy. Code is available at: https://github.com/RaZzzyz/LeanRAG
Problem

Research questions and friction points this paper is trying to address.

Addresses disconnected semantic islands in knowledge graphs
Improves structurally unaware retrieval in RAG systems
Reduces retrieval redundancy and enhances response quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic aggregation algorithm for entity clusters
Bottom-up structure-guided retrieval strategy
Navigable semantic network with explicit relations
🔎 Similar Papers
No similar papers found.
Y
Yaoze Zhang
Shanghai Artificial Intelligence Laboratory, University of Shanghai for Science and Technology
Rong Wu
Rong Wu
Zhejiang University
Pinlong Cai
Pinlong Cai
Shanghai Artificial Intelligence Laboratory
Artificial IntelligenceDecision IntelligenceKnowledge Systems
X
Xiaoman Wang
Shanghai Artificial Intelligence Laboratory, East China Normal University
G
Guohang Yan
Shanghai Artificial Intelligence Laboratory
S
Song Mao
Shanghai Artificial Intelligence Laboratory
D
Ding Wang
Shanghai Artificial Intelligence Laboratory
Botian Shi
Botian Shi
Shanghai Artificial Intelligence Laboratory
VLMsDocument UnderstandingAutonomous Driving