🤖 AI Summary
Existing NL2GQL research primarily focuses on single-turn translation, failing to address the prevalent multi-turn, context-dependent interactions between users and graph databases, and suffers from a lack of high-quality, multi-turn annotated datasets. To bridge this gap, we propose an LLM-based automated framework for constructing multi-turn NL2GQL data, integrating dialogue context modeling with graph query syntax constraints. Leveraging this method, we introduce MTGQL—the first domain-specific, multi-turn graph query dataset in finance—comprising over 10,000 dialogue turns. Using MTGQL, we design and evaluate three baseline models across multiple dimensions. Experimental results demonstrate the feasibility of multi-turn semantic understanding and GQL generation, thereby filling critical gaps in both data resources and benchmarking infrastructure. Our work establishes a reproducible foundation and methodological paradigm for dynamic graph query understanding.
📝 Abstract
In recent years, research on transforming natural language into graph query language (NL2GQL) has been increasing. Most existing methods focus on single-turn transformation from NL to GQL. In practical applications, user interactions with graph databases are typically multi-turn, dynamic, and context-dependent. While single-turn methods can handle straightforward queries, more complex scenarios often require users to iteratively adjust their queries, investigate the connections between entities, or request additional details across multiple dialogue turns. Research focused on single-turn conversion fails to effectively address multi-turn dialogues and complex context dependencies. Additionally, the scarcity of high-quality multi-turn NL2GQL datasets further hinders the progress of this field. To address this challenge, we propose an automated method for constructing multi-turn NL2GQL datasets based on Large Language Models (LLMs) , and apply this method to develop the MTGQL dataset, which is constructed from a financial market graph database and will be publicly released for future research. Moreover, we propose three types of baseline methods to assess the effectiveness of multi-turn NL2GQL translation, thereby laying a solid foundation for future research.