๐ค AI Summary
This work addresses persistent challenges in text-to-SQL generation over complex databases, including syntactic and semantic drift, non-transferable corrections, and inefficient context utilization. The authors propose a reflection-based, controllable generation framework that decomposes the synthesis process into typed stages. By integrating interpreter-driven syntactic and execution validation with large language modelโbased semantic coverage judgment, the framework localizes errors and iteratively refines each stage without requiring gold-standard SQL supervision. This approach enables localized constraint-aware repair while ensuring monotonic improvement in overall performance. Evaluated on the Spider and BIRD benchmarks, the method significantly outperforms strong prompting baselines, achieving stable convergence within only a few refinement steps and consistently boosting execution accuracy across both open-source and proprietary large language models.
๐ Abstract
Robust text-to-SQL over complex, real-world databases remains brittle even with modern LLMs: iterative refinement often introduces syntactic and semantic drift, corrections tend to be non-transferable across queries, and naive use of large context windows scales poorly. We propose a controlled text-to-SQL framework built around reflective refinement. Instead of repeatedly rewriting the current SQL instance, the system decomposes generation into typed stages and applies feedback as persistent updates to the stage-level generation mechanism. A Reflection-Refinement Loop localizes violations to the responsible stage maximize preservation of previously validated constraints and support monotonic improvement over a query set. The method operates without gold SQL by combining interpreter-based checks with LLM-based semantic coverage verification as epistemic judges. Experiments on Spider and BIRD demonstrate consistent gains over strong prompting baselines, robust convergence within a small refinement budget, and improved execution accuracy across both frontier and open-weight model families.