A Survey on Employing Large Language Models for Text-to-SQL Tasks

📅 2024-07-21
🏛️ arXiv.org
📈 Citations: 24
Influential: 1
📄 PDF
🤖 AI Summary
Current LLM-based Text-to-SQL approaches suffer from the absence of a unified taxonomic framework, limited cross-model comparability, and insufficient generalization and robustness. To address these issues, this paper proposes the first two-dimensional taxonomy for LLM-based Text-to-SQL methods: one dimension classifies techniques into prompt engineering and parameter fine-tuning; the other categorizes objectives as structure-aware parsing, execution-guided generation, and feedback-enhanced refinement. We systematically synthesize empirical results across major benchmarks—including Spider and WikiSQL—and models such as Codex, LLaMA, and GPT series. Through rigorous literature analysis, methodological abstraction, and cross-method performance comparison, we identify key determinants of prompt design efficacy, delineate the practical boundaries of fine-tuning strategies, and diagnose persistent generalization bottlenecks. Our synthesis distills recurring patterns and evolutionary trends, offering both theoretical foundations and actionable guidelines for developing efficient, robust, and interpretable Text-to-SQL systems.

Technology Category

Application Category

📝 Abstract
With the development of the Large Language Models (LLMs), a large range of LLM-based Text-to-SQL(Text2SQL) methods have emerged. This survey provides a comprehensive review of LLM-based Text2SQL studies. We first enumerate classic benchmarks and evaluation metrics. For the two mainstream methods, prompt engineering and finetuning, we introduce a comprehensive taxonomy and offer practical insights into each subcategory. We present an overall analysis of the above methods and various models evaluated on well-known datasets and extract some characteristics. Finally, we discuss the challenges and future directions in this field.
Problem

Research questions and friction points this paper is trying to address.

Reviewing LLM-based Text-to-SQL methods comprehensively
Analyzing prompt engineering and finetuning approaches
Discussing challenges and future research directions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey of LLM-based Text-to-SQL methods
Taxonomy for prompt engineering and finetuning
Analysis of models on well-known datasets
🔎 Similar Papers
No similar papers found.
L
Liang Shi
School of Computer Science, Peking University, China
Zhengju Tang
Zhengju Tang
Peking University
N
Nan Zhang
SINGDATA CLOUD PTE. LTD, USA
X
Xiaotong Zhang
SINGDATA CLOUD PTE. LTD, China
Z
Zhi Yang
Peking University, China