Pi-SQL: Enhancing Text-to-SQL with Fine-Grained Guidance from Pivot Programming Languages

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the accuracy bottleneck in natural language-to-SQL generation caused by semantic gaps, this paper proposes a two-stage approach that leverages high-resource Python programs as an executable intermediary: first, a large language model generates fine-grained, executable Python code capturing precise operational logic; second, SQL queries are synthesized from this Python representation. This work introduces, for the first time, Python as a semantic alignment pivot between natural language and SQL, enabling execution-feedback-driven SQL re-ranking, reward-based validity scoring, and result-consistency verification. Evaluated across multiple benchmarks, our method achieves a 3.20-percentage-point improvement in execution accuracy over the strongest baseline and increases effective efficiency score by 4.55. It significantly enhances generalization to complex queries and improves reasoning efficiency.

Technology Category

Application Category

📝 Abstract
Text-to-SQL transforms the user queries from natural language to executable SQL programs, enabling non-experts to interact with complex databases. Existing prompt-based methods craft meticulous text guidelines and examples to facilitate SQL generation, but their accuracy is hindered by the large semantic gap between the texts and the low-resource SQL programs. In this work, we propose Pi-SQL, which incorporates the high-resource Python program as a pivot to bridge between the natural language query and SQL program. In particular, Pi-SQL first generates Python programs that provide fine-grained step-by-step guidelines in their code blocks or comments, and then produces an SQL program following the guidance of each Python program.The final SQL program matches the reference Python program's query results and, through selection from candidates generated by different strategies, achieves superior execution speed, with a reward-based valid efficiency score up to 4.55 higher than the best-performing baseline.Extensive experiments demonstrate the effectiveness of Pi-SQL, which improves the execution accuracy of the best-performing baseline by up to 3.20.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic gap between natural language and SQL
Improving accuracy of text-to-SQL conversion
Enhancing SQL generation with Python pivot
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Python as pivot for SQL generation
Generates SQL via Python code guidance
Selects SQL candidates for optimal performance
🔎 Similar Papers
No similar papers found.
Y
Yongdong Chi
Shanghai University of Finance and Economics
H
Hanqing Wang
Shanghai University of Finance and Economics
Zonghan Yang
Zonghan Yang
Tsinghua University
Language agentFoundation models
J
Jian Yang
Beihang University
X
Xiao Yan
Wuhan University
Y
Yun Chen
Shanghai University of Finance and Economics
G
Guanhua Chen
Southern University of Science and Technology