Skeletons Matter: Dynamic Data Augmentation for Text-to-Query

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing semantic parsing approaches are typically designed for a single query language, exhibiting poor cross-lingual generalization. To address this, we propose a unified Text-to-Query paradigm that leverages “query skeletons”—language-agnostic structural templates—as the common intermediate representation for mapping natural language to diverse formal query languages (e.g., SQL, SPARQL). Methodologically, we introduce the first large language model–based dynamic data augmentation framework: it first extracts query skeletons from input utterances, then diagnoses model weaknesses via error analysis, and finally synthesizes targeted training examples. Crucially, our framework achieves state-of-the-art performance on four established benchmarks using only a small amount of synthetically generated data—demonstrating substantial improvements in both cross-lingual generalization and training efficiency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
The task of translating natural language questions into query languages has long been a central focus in semantic parsing. Recent advancements in Large Language Models (LLMs) have significantly accelerated progress in this field. However, existing studies typically focus on a single query language, resulting in methods with limited generalizability across different languages. In this paper, we formally define the Text-to-Query task paradigm, unifying semantic parsing tasks across various query languages. We identify query skeletons as a shared optimization target of Text-to-Query tasks, and propose a general dynamic data augmentation framework that explicitly diagnoses model-specific weaknesses in handling these skeletons to synthesize targeted training data. Experiments on four Text-to-Query benchmarks demonstrate that our method achieves state-of-the-art performance using only a small amount of synthesized data, highlighting the efficiency and generality of our approach and laying a solid foundation for unified research on Text-to-Query tasks. We release our code at https://github.com/jjjycaptain/Skeletron.
Problem

Research questions and friction points this paper is trying to address.

Unifying semantic parsing tasks across multiple query languages
Addressing limited generalizability of single-language text-to-query methods
Improving model performance on query skeletons through dynamic data augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic data augmentation framework diagnoses model weaknesses
Query skeletons serve as shared optimization target
Synthesizes targeted training data for multiple query languages
🔎 Similar Papers
Y
Yuchen Ji
School of Data Science, Fudan University
B
Bo Xu
School of Computer Science and Technology, Donghua University
J
Jie Shi
College of Computer Science and Artificial Intelligence, Fudan University
Jiaqing Liang
Jiaqing Liang
Fudan University
knowledge graphdeep learning
Deqing Yang
Deqing Yang
School of Data Science, Fudan University
Yu Mao
Yu Mao
City University of Hong Kong
Data CompressionEmbedded SystemEfficient Neural Network Design
Hai Chen
Hai Chen
Tsinghua University
robust 3D visionrecommendation systems
Y
Yanghua Xiao
College of Computer Science and Artificial Intelligence, Fudan University