🤖 AI Summary
Existing semantic parsing approaches are typically designed for a single query language, exhibiting poor cross-lingual generalization. To address this, we propose a unified Text-to-Query paradigm that leverages “query skeletons”—language-agnostic structural templates—as the common intermediate representation for mapping natural language to diverse formal query languages (e.g., SQL, SPARQL). Methodologically, we introduce the first large language model–based dynamic data augmentation framework: it first extracts query skeletons from input utterances, then diagnoses model weaknesses via error analysis, and finally synthesizes targeted training examples. Crucially, our framework achieves state-of-the-art performance on four established benchmarks using only a small amount of synthetically generated data—demonstrating substantial improvements in both cross-lingual generalization and training efficiency. The implementation is publicly available.
📝 Abstract
The task of translating natural language questions into query languages has long been a central focus in semantic parsing. Recent advancements in Large Language Models (LLMs) have significantly accelerated progress in this field. However, existing studies typically focus on a single query language, resulting in methods with limited generalizability across different languages. In this paper, we formally define the Text-to-Query task paradigm, unifying semantic parsing tasks across various query languages. We identify query skeletons as a shared optimization target of Text-to-Query tasks, and propose a general dynamic data augmentation framework that explicitly diagnoses model-specific weaknesses in handling these skeletons to synthesize targeted training data. Experiments on four Text-to-Query benchmarks demonstrate that our method achieves state-of-the-art performance using only a small amount of synthesized data, highlighting the efficiency and generality of our approach and laying a solid foundation for unified research on Text-to-Query tasks. We release our code at https://github.com/jjjycaptain/Skeletron.