🤖 AI Summary
Current LLM-based Text-to-SQL approaches suffer from the absence of a unified taxonomic framework, limited cross-model comparability, and insufficient generalization and robustness. To address these issues, this paper proposes the first two-dimensional taxonomy for LLM-based Text-to-SQL methods: one dimension classifies techniques into prompt engineering and parameter fine-tuning; the other categorizes objectives as structure-aware parsing, execution-guided generation, and feedback-enhanced refinement. We systematically synthesize empirical results across major benchmarks—including Spider and WikiSQL—and models such as Codex, LLaMA, and GPT series. Through rigorous literature analysis, methodological abstraction, and cross-method performance comparison, we identify key determinants of prompt design efficacy, delineate the practical boundaries of fine-tuning strategies, and diagnose persistent generalization bottlenecks. Our synthesis distills recurring patterns and evolutionary trends, offering both theoretical foundations and actionable guidelines for developing efficient, robust, and interpretable Text-to-SQL systems.
📝 Abstract
With the development of the Large Language Models (LLMs), a large range of LLM-based Text-to-SQL(Text2SQL) methods have emerged. This survey provides a comprehensive review of LLM-based Text2SQL studies. We first enumerate classic benchmarks and evaluation metrics. For the two mainstream methods, prompt engineering and finetuning, we introduce a comprehensive taxonomy and offer practical insights into each subcategory. We present an overall analysis of the above methods and various models evaluated on well-known datasets and extract some characteristics. Finally, we discuss the challenges and future directions in this field.