🤖 AI Summary
To address the challenge non-expert users face in directly querying knowledge graphs, this paper proposes a natural language-driven interactive query construction method. The approach employs a two-stage constrained language model that integrates ontology-based semantic constraints to generate syntactically and semantically valid query prototypes—thereby avoiding invalid classes, relations, and grammatical errors. A visual editor enables users to iteratively refine queries via natural language descriptions and graphical adjustments. Finally, an interpretable SPARQL translation pipeline converts the refined prototype into standard SPARQL. Evaluated across multiple ontologies and language models, the system consistently produces correct SPARQL queries without manual intervention, outperforming existing baselines in both retrieval accuracy and efficiency. Validation through synthetic data experiments and an initial user study confirms the method’s effectiveness, usability, and practical applicability.
📝 Abstract
Querying knowledge bases using ontologies is usually performed using dedicated query languages, question-answering systems, or visual query editors for Knowledge Graphs. We propose a novel approach that enables users to query the knowledge graph by specifying prototype graphs in natural language and visually editing them. This approach enables non-experts to formulate queries without prior knowledge of the ontology and specific query languages. Our approach converts natural language to these prototype graphs by utilizing a two-step constrained language model generation based on semantically similar features within an ontology. The resulting prototype graph serves as the building block for further user refinements within a dedicated visual query builder. Our approach consistently generates a valid SPARQL query within the constraints imposed by the ontology, without requiring any additional corrections to the syntax or classes and links used. Unlike related language models approaches, which often require multiple iterations to fix invalid syntax, non-existent classes, and non-existent links, our approach achieves this consistently. We evaluate the performance of our system using graph retrieval on synthetic queries, comparing multiple metrics, models, and ontologies. We further validate our system through a preliminary user study. By utilizing our constrained pipeline, we show that the system can perform efficient and accurate retrieval using more efficient models compared to other approaches.