🤖 AI Summary
This work addresses the inefficiency and poor robustness of current large language models (LLMs) when directly generating graph queries, which hinders non-experts from performing effective natural language analysis on large-scale, heterogeneous, and dynamically evolving property graphs. To overcome this limitation, the authors propose GraphSeek, a novel framework that decouples semantic reasoning from query execution. GraphSeek leverages a semantic catalog to guide the LLM in planning and reasoning, while delegating actual query execution to a deterministic graph query engine—thereby avoiding the generation of fragile, syntactically invalid queries. This approach substantially enhances both the effectiveness and token efficiency of small-context LLMs in complex graph analytics. Experimental results demonstrate that GraphSeek achieves an 86% success rate on complex tasks, significantly outperforming an enhanced LangChain baseline and offering a cost-effective, end-to-end solution for large-scale graph analysis.
📝 Abstract
Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally complex, and evolve dynamically. To address this, we devise a novel abstraction for complex multi-query analytics over such graphs. Its key idea is to replace brittle generation of graph queries directly from NL with planning over a Semantic Catalog that describes both the graph schema and the graph operations. Concretely, this induces a clean separation between a Semantic Plane for LLM planning and broader reasoning, and an Execution Plane for deterministic, database-grade query execution over the full dataset and tool implementations. This design yields substantial gains in both token efficiency and task effectiveness even with small-context LLMs. We use this abstraction as the basis of the first LLM-enhanced graph analytics framework called GraphSeek. GraphSeek achieves substantially higher success rates (e.g., 86% over enhanced LangChain) and points toward the next generation of affordable and accessible graph analytics that unify LLM reasoning with database-grade execution over large and complex property graphs.