Skeletons Matter: Dynamic Data Augmentation for Text-to-Query

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing semantic parsing approaches are typically designed for a single query language, exhibiting poor cross-lingual generalization. To address this, we propose a unified Text-to-Query paradigm that leverages “query skeletons”—language-agnostic structural templates—as the common intermediate representation for mapping natural language to diverse formal query languages (e.g., SQL, SPARQL). Methodologically, we introduce the first large language model–based dynamic data augmentation framework: it first extracts query skeletons from input utterances, then diagnoses model weaknesses via error analysis, and finally synthesizes targeted training examples. Crucially, our framework achieves state-of-the-art performance on four established benchmarks using only a small amount of synthetically generated data—demonstrating substantial improvements in both cross-lingual generalization and training efficiency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

The task of translating natural language questions into query languages has long been a central focus in semantic parsing. Recent advancements in Large Language Models (LLMs) have significantly accelerated progress in this field. However, existing studies typically focus on a single query language, resulting in methods with limited generalizability across different languages. In this paper, we formally define the Text-to-Query task paradigm, unifying semantic parsing tasks across various query languages. We identify query skeletons as a shared optimization target of Text-to-Query tasks, and propose a general dynamic data augmentation framework that explicitly diagnoses model-specific weaknesses in handling these skeletons to synthesize targeted training data. Experiments on four Text-to-Query benchmarks demonstrate that our method achieves state-of-the-art performance using only a small amount of synthesized data, highlighting the efficiency and generality of our approach and laying a solid foundation for unified research on Text-to-Query tasks. We release our code at https://github.com/jjjycaptain/Skeletron.

Problem

Research questions and friction points this paper is trying to address.

Unifying semantic parsing tasks across multiple query languages

Addressing limited generalizability of single-language text-to-query methods

Improving model performance on query skeletons through dynamic data augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic data augmentation framework diagnoses model weaknesses

Query skeletons serve as shared optimization target

Synthesizes targeted training data for multiple query languages

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks