STaR-SQL: Self-Taught Reasoner for Text-to-SQL

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Large language models (LLMs) exhibit insufficient reasoning capabilities for structured tasks such as text-to-SQL translation. Method: This paper proposes a reasoning-driven Chain-of-Thought (CoT) modeling paradigm: it reformulates SQL generation as a verifiable, multi-step logical reasoning process; introduces a Result-supervised Reward Model (ORM) to automatically validate the correctness of intermediate reasoning steps; and performs self-iterative fine-tuning grounded in high-quality reasoning paths. It further pioneers the systematic application of self-teaching CoT to structured semantic parsing, incorporating a test-time active reasoning mechanism that endows models with intrinsic reasoning capability—not merely prompt-responsive behavior. Contribution/Results: On the Spider benchmark, the approach achieves 86.6% execution accuracy—surpassing few-shot baselines by 31.6% and significantly outperforming GPT-4 Agent methods.

Technology Category

Application Category

📝 Abstract

Generating step-by-step"chain-of-thought"rationales has proven effective for improving the performance of large language models on complex reasoning tasks. However, applying such techniques to structured tasks, such as text-to-SQL, remains largely unexplored. In this paper, we introduce Self-Taught Reasoner for text-to-SQL (STaR-SQL), a novel approach that reframes SQL query generation as a reasoning-driven process. Our method prompts the LLM to produce detailed reasoning steps for SQL queries and fine-tunes it on rationales that lead to correct outcomes. Unlike traditional methods, STaR-SQL dedicates additional test-time computation to reasoning, thereby positioning LLMs as spontaneous reasoners rather than mere prompt-based agents. To further scale the inference process, we incorporate an outcome-supervised reward model (ORM) as a verifier, which enhances SQL query accuracy. Experimental results on the challenging Spider benchmark demonstrate that STaR-SQL significantly improves text-to-SQL performance, achieving an execution accuracy of 86.6%. This surpasses a few-shot baseline by 31.6% and a baseline fine-tuned to predict answers directly by 18.0%. Additionally, STaR-SQL outperforms agent-like prompting methods that leverage more powerful yet closed-source models such as GPT-4. These findings underscore the potential of reasoning-augmented training for structured tasks and open the door to extending self-improving reasoning models to text-to-SQL generation and beyond.

Problem

Research questions and friction points this paper is trying to address.

Enhances text-to-SQL via reasoning steps

Improves SQL accuracy with outcome supervision

Outperforms traditional and advanced prompting methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Taught Reasoner for SQL

Chain-of-thought rationales fine-tuning

Outcome-supervised reward model verification

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks