NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph-augmented RAG methods suffer from coarse-grained homogeneous graph designs and fragmented integration of graph algorithms, resulting in limited expressivity, poor scalability, and performance bottlenecks. This paper proposes a heterogeneous-node-centric graph modeling framework, introducing the first LLM-friendly graph indexing and reasoning paradigm that natively supports multi-granularity semantic modeling and seamless embedding of diverse graph algorithms—including knowledge graph construction, heterogeneous GNNs, and lightweight subgraph retrieval. By synergistically integrating heterogeneous graph neural networks with collaborative prompt engineering, our approach enables efficient multi-hop reasoning while minimizing retrieval token generation. On multi-hop question-answering benchmarks and head-to-head evaluations against GraphRAG and LightRAG, it reduces query/indexing latency by 37%, decreases storage overhead by 42%, and improves accuracy by 11.6%.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus, enabling factually consistent responses in specific domains. By exploiting the inherent structure of the corpus, graph-based RAG methods further enrich this process by building a knowledge graph index and leveraging the structural nature of graphs. However, current graph-based RAG approaches seldom prioritize the design of graph structures. Inadequately designed graph not only impede the seamless integration of diverse graph algorithms but also result in workflow inconsistencies and degraded performance. To further unleash the potential of graph for RAG, we propose NodeRAG, a graph-centric framework introducing heterogeneous graph structures that enable the seamless and holistic integration of graph-based methodologies into the RAG workflow. By aligning closely with the capabilities of LLMs, this framework ensures a fully cohesive and efficient end-to-end process. Through extensive experiments, we demonstrate that NodeRAG exhibits performance advantages over previous methods, including GraphRAG and LightRAG, not only in indexing time, query time, and storage efficiency but also in delivering superior question-answering performance on multi-hop benchmarks and open-ended head-to-head evaluations with minimal retrieval tokens. Our GitHub repository could be seen at https://github.com/Terry-Xu-666/NodeRAG.
Problem

Research questions and friction points this paper is trying to address.

Enhancing graph-based RAG with heterogeneous node design
Improving integration of graph algorithms in RAG workflows
Optimizing performance in indexing, querying, and QA tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces heterogeneous graph structures for RAG
Ensures seamless integration of graph-based methodologies
Improves performance in indexing and querying
🔎 Similar Papers
No similar papers found.
T
Tianyang Xu
Columbia University
H
Haojie Zheng
University of Pennsylvania
C
Chengze Li
Columbia University
H
Haoxiang Chen
Columbia University
Y
Yixin Liu
Lehigh University
Ruoxi Chen
Ruoxi Chen
Zhejiang University of Technology
Trustworthy AIMultimodal Models
L
Lichao Sun
Lehigh University