PaVeRL-SQL: Text-to-SQL via Partial-Match Rewards and Verbal Reinforcement Learning

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low execution accuracy of Text-to-SQL systems in industrial-scale databases with complex schemas and domain-specific business logic, this paper proposes a synergistic framework combining Partial-Match Rewards and Verbal Reinforcement Learning, enabling multi-LLM self-assessment and self-improvement. Methodologically, it integrates in-context learning, grouped self-evaluation, and chain-of-thought reinforcement learning, and designs a two-stage RL pipeline based on the lightweight OmniSQL-7B model to support mixed SQL dialect training. It is the first work to achieve end-to-end self-optimization in real-world industrial settings. On the Spider, Spider 2.0, and BIRD benchmarks, it achieves state-of-the-art performance: on Spider 2.0-SQLite, execution accuracy improves by 7.4% and 1.4% over prior baselines, respectively, while cross-dialect generalization capability increases threefold.

Technology Category

Application Category

📝 Abstract
Text-to-SQL models allow users to interact with a database more easily by generating executable SQL statements from natural-language questions. Despite recent successes on simpler databases and questions, current Text-to-SQL methods still suffer from low execution accuracy on industry-scale databases and complex questions involving domain-specific business logic. We present emph{PaVeRL-SQL}, a framework that combines emph{Partial-Match Rewards} and emph{Verbal Reinforcement Learning} to drive self-improvement in reasoning language models (RLMs) for Text-to-SQL. To handle practical use cases, we adopt two pipelines: (1) a newly designed in-context learning framework with group self-evaluation (verbal-RL), using capable open- and closed-source large language models (LLMs) as backbones; and (2) a chain-of-thought (CoT) RL pipeline with a small backbone model (OmniSQL-7B) trained with a specially designed reward function and two-stage RL. These pipelines achieve state-of-the-art (SOTA) results on popular Text-to-SQL benchmarks -- Spider, Spider 2.0, and BIRD. For the industrial-level Spider2.0-SQLite benchmark, the verbal-RL pipeline achieves an execution accuracy 7.4% higher than SOTA, and the CoT pipeline is 1.4% higher. RL training with mixed SQL dialects yields strong, threefold gains, particularly for dialects with limited training data. Overall, emph{PaVeRL-SQL} delivers reliable, SOTA Text-to-SQL under realistic industrial constraints. The code is available at https://github.com/PaVeRL-SQL/PaVeRL-SQL.
Problem

Research questions and friction points this paper is trying to address.

Improving execution accuracy on industry-scale databases and complex questions
Handling domain-specific business logic in natural language to SQL conversion
Addressing low performance with partial-match rewards and verbal reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partial-Match Rewards for SQL generation
Verbal Reinforcement Learning framework
In-context learning with group evaluation
🔎 Similar Papers
No similar papers found.
H
Heng Hao
Samsung SDSA, Mountain View, CA, USA
Wenjun Hu
Wenjun Hu
Assistant Professor of Electrical Engineering and Computer Science, Yale University
WirelessMobileNetworkingSystems
O
Oxana Verkholyak
Samsung SDSA, Mountain View, CA, USA
Davoud Ataee Tarzanagh
Davoud Ataee Tarzanagh
Samsung, University of Pennsylvania
Mathematical OptimizationMachine LearningFoundation ModelsTrustworthy AI
B
Baruch Gutow
Samsung SDSA, Mountain View, CA, USA
S
Sima Didari
Samsung SDSA, Mountain View, CA, USA
M
Masoud Faraki
Samsung SDSA, Mountain View, CA, USA
H
Hankyu Moon
Samsung SDSA, Mountain View, CA, USA
S
Seungjai Min
Samsung SDSA, Mountain View, CA, USA