🤖 AI Summary
This work addresses the scalability bottleneck of Graph-based Retrieval-Augmented Generation (GraphRAG) on ultra-large-scale, dynamic document collections. To enable GraphRAG at the million-document scale, we propose GeAR—a scalable system extension integrating efficient entity-relation extraction, incremental knowledge graph construction, graph-structure-aware multi-hop retrieval, and hierarchical graph indexing—collectively reducing graph construction and query latency. Evaluated on the SIGIR 2025 LiveRAG Challenge benchmark, GeAR maintains high-fidelity multi-hop reasoning while improving retrieval accuracy by 12.3% and achieving sub-850ms average response time. It represents the first end-to-end deployment and real-time question-answering support for GraphRAG over a live million-document corpus. The system provides a reproducible, extensible technical pathway for large-scale, dynamic knowledge-enhanced generation.
📝 Abstract
Recent studies have explored graph-based approaches to retrieval-augmented generation, leveraging structured or semi-structured information -- such as entities and their relations extracted from documents -- to enhance retrieval. However, these methods are typically designed to address specific tasks, such as multi-hop question answering and query-focused summarisation, and therefore, there is limited evidence of their general applicability across broader datasets. In this paper, we aim to adapt a state-of-the-art graph-based RAG solution: $ ext{GeAR}$ and explore its performance and limitations on the SIGIR 2025 LiveRAG Challenge.