Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper systematically surveys large language model (LLM)-driven Text-to-SQL techniques to lower the barrier for non-experts accessing relational databases. Methodologically, it employs bibliometric analysis, taxonomic comparison of approaches, meta-analysis of benchmark datasets, and critical cross-benchmark evaluation. It establishes, for the first time, a multidimensional methodology framework that clarifies paradigmatic evolution—from prompt engineering and supervised fine-tuning to reasoning augmentation—and identifies persistent evaluation blind spots. The study distills four major technical branches and five core challenges, proposing a unified conceptual evaluation framework. Furthermore, it introduces a structured knowledge graph and 12 scalable research directions, offering both theoretical guidance and a practical roadmap for the community. (136 words)

Technology Category

Application Category

📝 Abstract
Converting natural language (NL) questions into SQL queries, referred to as Text-to-SQL, has emerged as a pivotal technology for facilitating access to relational databases, especially for users without SQL knowledge. Recent progress in large language models (LLMs) has markedly propelled the field of natural language processing (NLP), opening new avenues to improve text-to-SQL systems. This study presents a systematic review of LLM-based text-to-SQL, focusing on four key aspects: (1) an analysis of the research trends in LLM-based text-to-SQL; (2) an in-depth analysis of existing LLM-based text-to-SQL techniques from diverse perspectives; (3) summarization of existing text-to-SQL datasets and evaluation metrics; and (4) discussion on potential obstacles and avenues for future exploration in this domain. This survey seeks to furnish researchers with an in-depth understanding of LLM-based text-to-SQL, sparking new innovations and advancements in this field.
Problem

Research questions and friction points this paper is trying to address.

Reviewing LLM-based Text-to-SQL research trends and techniques
Summarizing Text-to-SQL datasets and evaluation metrics
Identifying challenges and future directions in Text-to-SQL
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based natural language to SQL conversion
Systematic review of text-to-SQL techniques
Analysis of datasets and evaluation metrics
Y
Yiming Huang
Harbin Institute of Technology (Shenzhen), China
Jiyu Guo
Jiyu Guo
Harbin Institute of Technology, Shenzhen
Data-Centric AI Trustworthy AI Machine Learning
W
Wenxin Mao
Harbin Institute of Technology (Shenzhen), China
C
Cuiyun Gao
Harbin Institute of Technology (Shenzhen), Peng Cheng Laboratory, China
P
Peiyi Han
Harbin Institute of Technology (Shenzhen), Peng Cheng Laboratory, China
Chuanyi Liu
Chuanyi Liu
Pengcheng Laboratory, Harbin Institute of Technology, Shenzhen
Cloud ComputingCloud SecurityPrivacy Enhanced Technologies
Qing Ling
Qing Ling
School of Computer Science and Engineering, Sun Yat-Sen University
Signal ProcessingOptimizationControl