🤖 AI Summary
This paper addresses the natural language-to-SQL (NL2SQL) task empowered by large language models (LLMs), providing a systematic survey of its full lifecycle. Methodologically, it establishes a unified analytical framework across four dimensions: model design (schema- and instance-aware modeling), data construction (LLM-driven synthetic data generation), multi-granularity evaluation (spanning syntactic, executional, and semantic correctness), and error attribution (root-cause-driven fine-grained classification analysis). The key contributions are threefold: (1) it introduces, for the first time, an integrated full-lifecycle perspective on NL2SQL in the LLM era; (2) it formulates a development guideline balancing practicality and interpretability; and (3) it identifies core challenges—insufficient schema-aware reasoning, weak few-shot generalization, and poor robustness in real-world deployments—and maps them into a clear problem taxonomy and technical roadmap for future research.
📝 Abstract
Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL, a.k.a., Text-to-SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: NL2SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating NL2SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing NL2SQL errors to find the root cause and guiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for developing NL2SQL solutions. Finally, we discuss the research challenges and open problems of NL2SQL in the LLMs era.