AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Text-to-SQL systems are hindered by the scarcity of high-quality annotated data and insufficient capability for complex reasoning. To address these challenges, we propose a dual-driven co-optimization framework integrating data and model improvements. Methodologically, we (1) introduce Group Relative Policy Optimization (GRPO), a novel reinforcement learning algorithm that enhances training stability during policy optimization; (2) design a diversity-aware cold-start mechanism to mitigate initial policy bias; and (3) build an RL-ready data factory that jointly incorporates high-fidelity synthetic data generation, semantic-logical alignment verification, and diversity-guided sampling. Evaluated on the BIRD and Spider benchmarks, our single-model approach achieves state-of-the-art performance, significantly improving both accuracy and logical robustness in generating complex SQL queries. This work establishes a new paradigm for low-resource, high-complexity Text-to-SQL tasks.

Technology Category

Application Category

📝 Abstract

The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this paper, we propose a holistic framework that addresses these issues through a dual-centric approach. From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness and precise semantic-logic alignment, ensured by strict verification. From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework. This framework employs a Diversity-Aware Cold Start stage to initialize a robust policy, followed by Group Relative Policy Optimization (GRPO) to refine the agent's reasoning via environmental feedback. Extensive experiments on BIRD and Spider benchmarks demonstrate that our synergistic approach achieves state-of-the-art performance among single-model methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of high-quality Text-to-SQL training data

Enhances model reasoning in complex SQL generation scenarios

Improves data synthesis and policy optimization for Text-to-SQL systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative data factory synthesizes high-fidelity training data.

Agentic reinforcement learning framework with diversity-aware cold start.

Group relative policy optimization refines reasoning via environmental feedback.

🔎 Similar Papers

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration