🤖 AI Summary
Existing GraphRAG research predominantly focuses on RDF knowledge graphs and SPARQL queries, overlooking the potential of Cypher and labeled property graph (LPG) databases for scalable reasoning. This paper introduces the first multi-agent framework for text-to-Cypher query generation tailored to LPGs, integrating a large language model–driven workflow, iterative query refinement, and content-aware feedback-based error correction. Evaluated on Memgraph using CypherBench and an industrial building information modeling (IFC) dataset, the framework achieves high accuracy in Cypher generation across both general-domain and digital twin scenarios. It significantly improves semantic consistency and syntactic correctness over baselines. Experimental results demonstrate the framework’s effectiveness, robustness, and scalability for industrial-grade structured knowledge retrieval from LPG databases.
📝 Abstract
While Retrieval-Augmented Generation (RAG) methods commonly draw information from unstructured documents, the emerging paradigm of GraphRAG aims to leverage structured data such as knowledge graphs. Most existing GraphRAG efforts focus on Resource Description Framework (RDF) knowledge graphs, relying on triple representations and SPARQL queries. However, the potential of Cypher and Labeled Property Graph (LPG) databases to serve as scalable and effective reasoning engines within GraphRAG pipelines remains underexplored in current research literature. To fill this gap, we propose Multi-Agent GraphRAG, a modular LLM agentic system for text-to-Cypher query generation serving as a natural language interface to LPG-based graph data. Our proof-of-concept system features an LLM-based workflow for automated Cypher queries generation and execution, using Memgraph as the graph database backend. Iterative content-aware correction and normalization, reinforced by an aggregated feedback loop, ensures both semantic and syntactic refinement of generated queries. We evaluate our system on the CypherBench graph dataset covering several general domains with diverse types of queries. In addition, we demonstrate performance of the proposed workflow on a property graph derived from the IFC (Industry Foundation Classes) data, representing a digital twin of a building. This highlights how such an approach can bridge AI with real-world applications at scale, enabling industrial digital automation use cases.