🤖 AI Summary
To address the poor generalization and limited interpretability of NL2SQL models in complex scenarios—such as multi-table joins, nested queries, and cross-domain applications (e.g., finance, healthcare)—this paper proposes the first reinforcement learning (RL)-based training paradigm specifically designed for NL2SQL. Leveraging the Proximal Policy Optimization (PPO) algorithm, we introduce an execution-feedback-driven reward function tailored to SQL semantics, integrated with a 7B-scale language model, lightweight synthetic data augmentation, and RL-aware data engineering strategies—effectively mitigating the cold-start problem. Our approach achieves high efficiency with minimal synthetic data, attaining 88.6% and 66.6% execution accuracy on Spider and BIRD benchmarks, respectively, substantially outperforming same-scale supervised fine-tuning baselines. The core contributions are: (1) the first dedicated RL framework for NL2SQL; and (2) end-to-end SQL generation that simultaneously ensures strong cross-domain generalization, high interpretability via execution feedback, and low data dependency.
📝 Abstract
Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the inference performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning (SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments (e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning (RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6% and 66.6% on the benchmark Spider and BIRD, respectively, only using the 7B base model.