Beyond Query-Level Comparison: Fine-Grained Reinforcement Learning for Text-to-SQL with Automated Interpretable Critiques

πŸ“… 2025-11-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing Text-to-SQL evaluation relies on costly human-annotated gold SQL queries, and mainstream reinforcement learning approaches employ only binary execution accuracy as reward, ignoring fine-grained semantic and structural errors. To address these limitations, we propose RuCo-Cβ€”the first generative, interpretable, fine-grained RL framework for Text-to-SQL that automatically generates query-level review criteria and structure-aware scoring without human annotations. Its core innovations are: (1) a generative evaluator model that performs automated, multi-dimensional SQL quality assessment via natural-language feedback; and (2) a progressive exploration strategy delivering dense, syntax- and semantics-aligned reward signals. RuCo-C achieves significant improvements over state-of-the-art methods across multiple benchmarks, demonstrating the critical role of interpretable, fine-grained feedback in optimizing Text-to-SQL models.

Technology Category

Application Category

πŸ“ Abstract
Text-to-SQL, a pivotal natural language processing (NLP) task that converts textual queries into executable SQL, has seen substantial progress in recent years. However, existing evaluation and reward mechanisms used to train and assess the text-to-SQL models remain a critical bottleneck. Current approaches heavily rely on manually annotated gold SQL queries, which are costly to produce and impractical for large-scale evaluation. More importantly, most reinforcement learning (RL) methods in text-to-SQL leverage only the final binary execution outcome as the reward signal, a coarse-grained supervision that overlooks detailed structural and semantic errors from the perspective of rubrics. To address these challenges, we propose RuCo-C, a novel generative judge model for fine-grained, query-specific automatic evaluation using interpretable critiques without human intervention. Our framework first automatically generates query-specific evaluation rubrics for human-free annotation, linking them to interpretable critiques. Subsequently, it integrates densified reward feedback through a "progressive exploration" strategy during the RL training process, which dynamically adjusts the rewards to enhance the model's performance. Comprehensive experiments demonstrate that RuCo-C outperforms existing methods in text-to-SQL evaluation, yielding significant performance gains.
Problem

Research questions and friction points this paper is trying to address.

Addresses coarse-grained rewards in text-to-SQL reinforcement learning
Proposes automated interpretable critiques for fine-grained evaluation
Reduces reliance on costly manual SQL annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated interpretable critiques for fine-grained evaluation
Progressive exploration strategy for densified reward feedback
Query-specific rubrics generation without human annotation
πŸ”Ž Similar Papers
G
Guifeng Wang
ByteDance, Beijing, China
Yuanfeng Song
Yuanfeng Song
Unknown affiliation
NLP4DataData VisualizationText2SQLLLM
M
Meng Yang
ByteDance, Beijing, China
T
Tao Zhu
ByteDance, Beijing, China
X
Xiaoming Yin
ByteDance, Beijing, China
X
Xing Chen
ByteDance, Beijing, China