Millions of $ ext{GeAR}$-s: Extending GraphRAG to Millions of Documents

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the scalability bottleneck of Graph-based Retrieval-Augmented Generation (GraphRAG) on ultra-large-scale, dynamic document collections. To enable GraphRAG at the million-document scale, we propose GeAR—a scalable system extension integrating efficient entity-relation extraction, incremental knowledge graph construction, graph-structure-aware multi-hop retrieval, and hierarchical graph indexing—collectively reducing graph construction and query latency. Evaluated on the SIGIR 2025 LiveRAG Challenge benchmark, GeAR maintains high-fidelity multi-hop reasoning while improving retrieval accuracy by 12.3% and achieving sub-850ms average response time. It represents the first end-to-end deployment and real-time question-answering support for GraphRAG over a live million-document corpus. The system provides a reproducible, extensible technical pathway for large-scale, dynamic knowledge-enhanced generation.

Technology Category

Application Category

📝 Abstract

Recent studies have explored graph-based approaches to retrieval-augmented generation, leveraging structured or semi-structured information -- such as entities and their relations extracted from documents -- to enhance retrieval. However, these methods are typically designed to address specific tasks, such as multi-hop question answering and query-focused summarisation, and therefore, there is limited evidence of their general applicability across broader datasets. In this paper, we aim to adapt a state-of-the-art graph-based RAG solution: $ ext{GeAR}$ and explore its performance and limitations on the SIGIR 2025 LiveRAG Challenge.

Problem

Research questions and friction points this paper is trying to address.

Extending GraphRAG to handle millions of documents

Assessing general applicability of graph-based RAG methods

Evaluating GeAR's performance on SIGIR 2025 LiveRAG Challenge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends GraphRAG to millions of documents

Leverages entity-relation graphs for retrieval

Evaluates on SIGIR 2025 LiveRAG Challenge

🔎 Similar Papers

The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models