FINER-SQL: Boosting Small Language Models for Text-to-SQL

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses the limitations of small language models in Text-to-SQL tasks, which stem from weak reasoning capabilities and poor instruction following, compounded by conventional reinforcement learning’s reliance on sparse binary rewards that offer insufficient supervisory signals. To overcome these challenges, the authors propose the FINER-SQL framework, which introduces fine-grained execution feedback and two interpretable dense rewards—memory reward and atomic reward—to enable structure-level partial credit and align semantic stability. By integrating group relative policy optimization, the approach transforms discrete correctness into continuous learning signals, facilitating stable training without an external critic. Evaluated on the BIRD and Spider benchmarks, a 3B-parameter model achieves execution accuracies of 67.73% and 85%, respectively, with inference latency as low as 5.57 seconds per sample, matching the performance of much larger models.
📝 Abstract
Large language models have driven major advances in Text-to-SQL generation. However, they suffer from high computational cost, long latency, and data privacy concerns, which make them impractical for many real-world applications. A natural alternative is to use small language models (SLMs), which enable efficient and private on-premise deployment. Yet, SLMs often struggle with weak reasoning and poor instruction following. Conventional reinforcement learning methods based on sparse binary rewards (0/1) provide little learning signal when the generated SQLs are incorrect, leading to unstable or collapsed training. To overcome these issues, we propose FINER-SQL, a scalable and reusable reinforcement learning framework that enhances SLMs through fine-grained execution feedback. Built on group relative policy optimization, FINER-SQL replaces sparse supervision with dense and interpretable rewards that offer continuous feedback even for incorrect SQLs. It introduces two key reward functions: a memory reward, which aligns reasoning with verified traces for semantic stability, and an atomic reward, which measures operation-level overlap to grant partial credit for structurally correct but incomplete SQLs. This approach transforms discrete correctness into continuous learning, enabling stable, critic-free optimization. Experiments on the BIRD and Spider benchmarks show that FINER-SQL achieves up to 67.73\% and 85\% execution accuracy with a 3B model -- matching much larger LLMs while reducing inference latency to 5.57~s/sample. These results highlight a cost-efficient and privacy-preserving path toward high-performance Text-to-SQL generation. Our code is available at https://github.com/thanhdath/finer-sql.
Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL
small language models
reinforcement learning
sparse rewards
execution accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained reward
small language models
Text-to-SQL
reinforcement learning
execution feedback
🔎 Similar Papers