AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-SQL systems are hindered by the scarcity of high-quality annotated data and insufficient capability for complex reasoning. To address these challenges, we propose a dual-driven co-optimization framework integrating data and model improvements. Methodologically, we (1) introduce Group Relative Policy Optimization (GRPO), a novel reinforcement learning algorithm that enhances training stability during policy optimization; (2) design a diversity-aware cold-start mechanism to mitigate initial policy bias; and (3) build an RL-ready data factory that jointly incorporates high-fidelity synthetic data generation, semantic-logical alignment verification, and diversity-guided sampling. Evaluated on the BIRD and Spider benchmarks, our single-model approach achieves state-of-the-art performance, significantly improving both accuracy and logical robustness in generating complex SQL queries. This work establishes a new paradigm for low-resource, high-complexity Text-to-SQL tasks.

Technology Category

Application Category

📝 Abstract
The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this paper, we propose a holistic framework that addresses these issues through a dual-centric approach. From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness and precise semantic-logic alignment, ensured by strict verification. From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework. This framework employs a Diversity-Aware Cold Start stage to initialize a robust policy, followed by Group Relative Policy Optimization (GRPO) to refine the agent's reasoning via environmental feedback. Extensive experiments on BIRD and Spider benchmarks demonstrate that our synergistic approach achieves state-of-the-art performance among single-model methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of high-quality Text-to-SQL training data
Enhances model reasoning in complex SQL generation scenarios
Improves data synthesis and policy optimization for Text-to-SQL systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative data factory synthesizes high-fidelity training data.
Agentic reinforcement learning framework with diversity-aware cold start.
Group relative policy optimization refines reasoning via environmental feedback.
C
Cehua Yang
Sichuan University
D
Dongyu Xiao
Sichuan University
J
Junming Lin
Sichuan University
Yuyang Song
Yuyang Song
Toyota Research Institute of North America
Composite materialsSmart material4D Printing
H
Hanxu Yan
Sichuan University
S
Shawn Guo
IQuest Research
W
Wei Zhang
Beihang University
J
Jian Yang
Beihang University
Mingjie Tang
Mingjie Tang
Purdue University
databasedata miningmachine learningspatial data processing
B
Bryan Dai
IQuest Research