LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional GraphRAG relies on unstable and costly relation extraction for knowledge graph construction, which introduces significant noise and degrades retrieval quality. To address this, we propose LinearRAG—a novel framework that eliminates explicit relation modeling. It introduces a hierarchical Tri-Graph index structure that is inherently relation-free, built solely via lightweight entity recognition and semantic linking, ensuring linear scalability, low noise, and minimal computational overhead. During retrieval, LinearRAG employs a two-stage mechanism: (i) local semantic bridging activation to refine candidate neighborhoods, followed by (ii) global importance aggregation to enhance paragraph localization accuracy. Evaluated on four benchmark datasets, LinearRAG consistently outperforms state-of-the-art baselines in retrieval accuracy while accelerating graph construction by an order of magnitude—demonstrating strong suitability for ultra-large-scale unstructured text applications.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) is widely used to mitigate hallucinations of Large Language Models (LLMs) by leveraging external knowledge. While effective for simple queries, traditional RAG systems struggle with large-scale, unstructured corpora where information is fragmented. Recent advances incorporate knowledge graphs to capture relational structures, enabling more comprehensive retrieval for complex, multi-hop reasoning tasks. However, existing graph-based RAG (GraphRAG) methods rely on unstable and costly relation extraction for graph construction, often producing noisy graphs with incorrect or inconsistent relations that degrade retrieval quality. In this paper, we revisit the pipeline of existing GraphRAG systems and propose LinearRAG (Linear Graph-based Retrieval-Augmented Generation), an efficient framework that enables reliable graph construction and precise passage retrieval. Specifically, LinearRAG constructs a relation-free hierarchical graph, termed Tri-Graph, using only lightweight entity extraction and semantic linking, avoiding unstable relation modeling. This new paradigm of graph construction scales linearly with corpus size and incurs no extra token consumption, providing an economical and reliable indexing of the original passages. For retrieval, LinearRAG adopts a two-stage strategy: (i) relevant entity activation via local semantic bridging, followed by (ii) passage retrieval through global importance aggregation. Extensive experiments on four datasets demonstrate that LinearRAG significantly outperforms baseline models.
Problem

Research questions and friction points this paper is trying to address.

Addresses unreliable relation extraction in graph-based RAG systems
Enables scalable graph construction for large unstructured corpora
Improves retrieval accuracy for complex multi-hop reasoning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LinearRAG constructs relation-free hierarchical graph using lightweight extraction
Framework enables linear scaling with corpus size without extra tokens
Two-stage retrieval combines entity activation with importance aggregation
🔎 Similar Papers
No similar papers found.
L
Luyao Zhuang
The Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR
Shengyuan Chen
Shengyuan Chen
The Hong Kong Polytechnic University
reasoningknowledge graphsLLMs
Y
Yilin Xiao
The Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR
Huachi Zhou
Huachi Zhou
The Hong Kong Polytechnic University
Recommender System
Y
Yujing Zhang
The Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR
H
Hao Chen
The Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR
Qinggang Zhang
Qinggang Zhang
The Hong Kong Polytechnic University
Knowledge GraphsLarge Language ModelsRetrieval-Augmented GenerationText-to-SQL
X
Xiao Huang
The Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR