LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing knowledge graph–based RAG methods suffer from three key limitations: (1) incomplete or erroneous retrieved contexts; (2) isolated high-level semantic abstractions lacking explicit inter-concept relationships—forming “semantic islands”; and (3) underutilization of graph topology, resulting in inefficient retrieval. To address these, we propose a Semantic Aggregation Network coupled with a Hierarchical Structure-Aware Retrieval framework. First, we design a semantic aggregation algorithm that explicitly models relational dependencies among high-level concepts, constructing a navigable semantic network. Second, we introduce a bottom-up hierarchical retrieval strategy integrating entity clustering with path-guided fine-grained anchoring, enabling coherent cross-community reasoning and precise context acquisition. Evaluated on four cross-domain QA benchmarks, our method significantly outperforms state-of-the-art approaches: it improves response quality and reduces retrieval redundancy by 46%.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) plays a crucial role in grounding Large Language Models by leveraging external knowledge, whereas the effectiveness is often compromised by the retrieval of contextually flawed or incomplete information. To address this, knowledge graph-based RAG methods have evolved towards hierarchical structures, organizing knowledge into multi-level summaries. However, these approaches still suffer from two critical, unaddressed challenges: high-level conceptual summaries exist as disconnected ``semantic islands'', lacking the explicit relations needed for cross-community reasoning; and the retrieval process itself remains structurally unaware, often degenerating into an inefficient flat search that fails to exploit the graph's rich topology. To overcome these limitations, we introduce LeanRAG, a framework that features a deeply collaborative design combining knowledge aggregation and retrieval strategies. LeanRAG first employs a novel semantic aggregation algorithm that forms entity clusters and constructs new explicit relations among aggregation-level summaries, creating a fully navigable semantic network. Then, a bottom-up, structure-guided retrieval strategy anchors queries to the most relevant fine-grained entities and then systematically traverses the graph's semantic pathways to gather concise yet contextually comprehensive evidence sets. The LeanRAG can mitigate the substantial overhead associated with path retrieval on graphs and minimizes redundant information retrieval. Extensive experiments on four challenging QA benchmarks with different domains demonstrate that LeanRAG significantly outperforming existing methods in response quality while reducing 46% retrieval redundancy. Code is available at: https://github.com/RaZzzyz/LeanRAG

Problem

Research questions and friction points this paper is trying to address.

Addresses disconnected semantic islands in knowledge graphs

Improves structurally unaware retrieval in RAG systems

Reduces retrieval redundancy and enhances response quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic aggregation algorithm for entity clusters

Bottom-up structure-guided retrieval strategy

Navigable semantic network with explicit relations

🔎 Similar Papers

LightRAG: Simple and Fast Retrieval-Augmented Generation