🤖 AI Summary
Text-to-SQL systems are hindered by the scarcity of high-quality annotated data and insufficient capability for complex reasoning. To address these challenges, we propose a dual-driven co-optimization framework integrating data and model improvements. Methodologically, we (1) introduce Group Relative Policy Optimization (GRPO), a novel reinforcement learning algorithm that enhances training stability during policy optimization; (2) design a diversity-aware cold-start mechanism to mitigate initial policy bias; and (3) build an RL-ready data factory that jointly incorporates high-fidelity synthetic data generation, semantic-logical alignment verification, and diversity-guided sampling. Evaluated on the BIRD and Spider benchmarks, our single-model approach achieves state-of-the-art performance, significantly improving both accuracy and logical robustness in generating complex SQL queries. This work establishes a new paradigm for low-resource, high-complexity Text-to-SQL tasks.
📝 Abstract
The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this paper, we propose a holistic framework that addresses these issues through a dual-centric approach. From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness and precise semantic-logic alignment, ensured by strict verification. From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework. This framework employs a Diversity-Aware Cold Start stage to initialize a robust policy, followed by Group Relative Policy Optimization (GRPO) to refine the agent's reasoning via environmental feedback. Extensive experiments on BIRD and Spider benchmarks demonstrate that our synergistic approach achieves state-of-the-art performance among single-model methods.