Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-SQL reinforcement learning (RL) approaches rely either on costly SQL execution against databases or large language model (LLM)-based scoring, resulting in high latency and substantial GPU memory overhead. This work proposes an execution-free RL fine-tuning framework for text-to-SQL. Its core contributions are: (1) GMNScore, the first reward model leveraging graph-structured matching of SQL queries to capture semantic and syntactic fidelity without execution; and (2) StepRTM, a stepwise reward mechanism integrating intermediate supervision from CTE subqueries and lightweight graph representation learning. Crucially, the framework eliminates dependence on database execution or LLM scoring during training. Evaluated on Spider and BIRD benchmarks, it achieves 4.2–6.8% absolute gains in SQL execution accuracy, reduces inference latency by 72%, and cuts GPU memory consumption by 65%, significantly outperforming both execution-based and LLM-based reward methods.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel Text-to-SQL RL fine-tuning framework named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing inference time and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and structural clarity of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models.
Problem

Research questions and friction points this paper is trying to address.

Reduces execution latency in Text-to-SQL RL by avoiding repeated database calls
Minimizes GPU memory overhead in LLM-based reward models for Text-to-SQL
Improves SQL functional correctness and structural clarity via stepwise rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GMNScore for execution-free reward modeling
Employs SQL graph representations for efficiency
Introduces StepRTM for stepwise CTE supervision
🔎 Similar Papers
No similar papers found.