R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses two key limitations in existing Text-to-SQL approaches: inconsistent ranking among logically equivalent SQL queries and poor robustness when the correct SQL is absent from the candidate set. To resolve these issues, the authors propose a unified reward mechanism that clusters candidate SQL queries based on execution results, then integrates pairwise preference signals across clusters with pointwise quality scores within clusters to achieve consistent ranking. Furthermore, they introduce a judgment-based proxy-driven resampling strategy that actively expands the candidate set when the absence of the correct SQL is detected. The method achieves a new state-of-the-art execution accuracy of 75.03% on BIRD-dev under publicly available model scales and demonstrates consistent performance gains across five benchmark datasets.

📝 Abstract

Modern Text-to-SQL systems generate multiple candidate SQL queries and rank them to judge a final prediction. However, existing methods face two limitations. First, they often score functionally equivalent SQL queries inconsistently despite identical execution results. Second, ranking cannot recover when the correct SQL is absent from the candidate pool. We propose R$^3$-SQL, a Text-to-SQL framework that addresses both issues through unified reward for ranking and resampling. R$^3$-SQL first groups candidates by execution result and ranks groups for consistency. To score each group, it combines a pairwise preference across groups with a pointwise utility from the best group rank and size, capturing relative preference, consistency, and candidate quality. To improve candidate recall, R$^3$-SQL introduces agentic resampling, which judges the generated candidate pool and selectively resamples when the correct SQL is likely absent. R$^3$-SQL achieves 75.03 execution accuracy on BIRD-dev, a new state of the art among methods using models with disclosed sizes, with consistent gains across five benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

ranking

candidate recall

execution consistency

SQL equivalence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-SQL

ranking consistency

agentic resampling