Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing text-to-SQL reinforcement learning (RL) approaches rely either on costly SQL execution against databases or large language model (LLM)-based scoring, resulting in high latency and substantial GPU memory overhead. This work proposes an execution-free RL fine-tuning framework for text-to-SQL. Its core contributions are: (1) GMNScore, the first reward model leveraging graph-structured matching of SQL queries to capture semantic and syntactic fidelity without execution; and (2) StepRTM, a stepwise reward mechanism integrating intermediate supervision from CTE subqueries and lightweight graph representation learning. Crucially, the framework eliminates dependence on database execution or LLM scoring during training. Evaluated on Spider and BIRD benchmarks, it achieves 4.2–6.8% absolute gains in SQL execution accuracy, reduces inference latency by 72%, and cuts GPU memory consumption by 65%, significantly outperforming both execution-based and LLM-based reward methods.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel Text-to-SQL RL fine-tuning framework named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing inference time and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and structural clarity of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models.

Problem

Research questions and friction points this paper is trying to address.

Reduces execution latency in Text-to-SQL RL by avoiding repeated database calls

Minimizes GPU memory overhead in LLM-based reward models for Text-to-SQL

Improves SQL functional correctness and structural clarity via stepwise rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GMNScore for execution-free reward modeling

Employs SQL graph representations for efficiency

Introduces StepRTM for stepwise CTE supervision

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks