Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-SQL systems still struggle to generate correct and executable SQL queries for complex natural language questions. To address this, we propose a lightweight reinforcement learning framework that eliminates fragile intermediate supervision and intricate reward engineering, introducing instead a minimalist binary reward signal solely based on SQL execution outcomes. Our method integrates Proximal Policy Optimization (PPO), execution-feedback-driven supervised fine-tuning, curriculum learning, and rigorous data cleaning. Additionally, we incorporate value-retrieval augmentation and majority-voting-based inference ensembling. Evaluated on six mainstream Text2SQL benchmarks, our approach achieves state-of-the-art execution accuracy, ranking first on the BIRD leaderboard. Notably, our 7B-parameter model significantly outperforms prior systems with 70B parameters, while offering superior training stability and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Translating natural language into SQL (Test2SQL) is a longstanding challenge at the intersection of natural language understanding and structured data access. While large language models (LLMs) have significantly improved fluency in SQL generation, producing correct and executable SQL--particularly for complex queries--remains a bottleneck. We present Arctic-Text2SQL-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executable SQL using a lightweight reward signal based solely on execution correctness. Our approach avoids brittle intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task. Combined with carefully curated data, strong supervised initialization, and effective training practices, Arctic-Text2SQL-R1 achieves state-of-the-art execution accuracy across six diverse Test2SQL benchmarks, including the top position on the BIRD leaderboard. Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency. We further demonstrate inference-time robustness through simple extensions like value retrieval and majority voting. Extensive experiments and ablation studies offer both positive and negative insights, providing practical guidance for future Test2SQL research.
Problem

Research questions and friction points this paper is trying to address.

Improving accuracy of natural language to SQL translation
Generating executable SQL for complex queries efficiently
Avoiding brittle supervision with lightweight reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning for SQL generation
Lightweight reward based on execution correctness
Strong supervised initialization and training practices
🔎 Similar Papers
No similar papers found.
Zhewei Yao
Zhewei Yao
Snowflake
LLMEfficient AIMLSys
Guoheng Sun
Guoheng Sun
University of Maryland, College Park
Deep LearningNatural Language ProcessingMobile Computing
L
Lukasz Borchmann
Snowflake AI Research
Zheyu Shen
Zheyu Shen
Graduate Student of Electronic and Computer Engineering, University of Maryland
Machine Learning SystemLarge Language Model
M
Minghang Deng
University of California, San Diego
Bohan Zhai
Bohan Zhai
Research Scientist at Apple, UC Berkeley
multimodalvision-languageNLP
H
Hao Zhang
Snowflake AI Research, University of California, San Diego
A
Ang Li
University of Maryland, Colleage Park
Y
Yuxiong He
Snowflake AI Research