🤖 AI Summary
This paper addresses three core challenges in large language model (LLM)-driven Text-to-SQL: low contextual accuracy, brittle schema linking, and constraints on computational efficiency and data privacy. To tackle these, we systematically survey the technical evolution of Text-to-SQL and—first in the literature—rigorously investigate Graph-based Retrieval-Augmented Generation (Graph RAG) for SQL semantic parsing. We propose a unified analytical framework encompassing benchmarking methodologies, evaluation metrics, and key open challenges. Empirical results demonstrate that Graph RAG significantly enhances schema understanding and contextual alignment. Our analysis clarifies the paradigm shift from rule-based approaches to RAG-enhanced methods, explicitly identifying computational efficiency, model robustness, and privacy preservation as the three principal bottlenecks. The work provides both theoretical foundations and practical guidance for developing next-generation Text-to-SQL systems that are trustworthy, interpretable, and highly accurate.
📝 Abstract
LLMs when used with Retrieval Augmented Generation (RAG), are greatly improving the SOTA of translating natural language queries to structured and correct SQL. Unlike previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches that use (RAG) systems. We discuss benchmarks, evaluation methods, and evaluation metrics. Also, we uniquely study the use of Graph RAGs for better contextual accuracy and schema linking in these systems. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy toward improvements of LLM-based text-to-SQL systems.