Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural text-to-SQL generation heavily relies on high-quality, manually annotated SQL queries—a major bottleneck in low-resource settings. Method: This paper proposes an execution-guided reinforcement learning (RL) framework that leverages question-answer pairs as weak supervision, eliminating dependence on gold-standard SQL annotations. It reformulates SQL generation as an execution-driven RL task and introduces Group Relative Policy Optimization (GRPO), a novel policy optimization framework designed for stable training under sparse, database-execution-based rewards. Contribution/Results: The approach significantly enhances symbolic reasoning and generalization capabilities. On the Spider benchmark, it improves SQL execution accuracy from 31.49% to 49.83% and reduces error rate to 14.71%. Remarkably, its performance approaches that of SQLCoder-70B—a model with an order-of-magnitude larger parameter count—establishing a new paradigm for resource-efficient, execution-aware text-to-SQL generation.

Technology Category

Application Category

📝 Abstract
In this work, we study the problem of code generation with a large language model (LLM), with a focus on generating SQL queries from natural language questions. We ask: Instead of using supervised fine tuning with text-code pairs, can we tune a model by having it interact with a database engine? We frame this problem as a reinforcement learning problem where the model receives execution-based feedback from the environment in the form of scalar rewards. These rewards penalize execution failures and assign positive values when a query returns a correct answer. We use the rewards within the Group Relative Policy Optimization (GRPO) framework. We use a tabular reasoning benchmark to test and evaluate our findings. We find that with only weak supervision in the form of question-answer pairs, RL-tuning improves the accuracy of model generated SQL code from 31.49 to 49.83 while reducing error percentage from 25.43% to 14.71%. This improvement allowed the model nearly match the performance performance to the larger SQLCoder-70B model. Our work demonstrates the potential of using execution-based feedback to improve symbolic reasoning capabilities of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improving SQL generation from natural language using execution feedback
Replacing supervised fine-tuning with reinforcement learning for code generation
Enhancing LLM symbolic reasoning via database interaction rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning with execution feedback
GRPO framework for SQL generation
Weak supervision via question-answer pairs
🔎 Similar Papers
No similar papers found.