From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

📅 2024-10-01

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This paper addresses three core challenges in large language model (LLM)-driven Text-to-SQL: low contextual accuracy, brittle schema linking, and constraints on computational efficiency and data privacy. To tackle these, we systematically survey the technical evolution of Text-to-SQL and—first in the literature—rigorously investigate Graph-based Retrieval-Augmented Generation (Graph RAG) for SQL semantic parsing. We propose a unified analytical framework encompassing benchmarking methodologies, evaluation metrics, and key open challenges. Empirical results demonstrate that Graph RAG significantly enhances schema understanding and contextual alignment. Our analysis clarifies the paradigm shift from rule-based approaches to RAG-enhanced methods, explicitly identifying computational efficiency, model robustness, and privacy preservation as the three principal bottlenecks. The work provides both theoretical foundations and practical guidance for developing next-generation Text-to-SQL systems that are trustworthy, interpretable, and highly accurate.

Technology Category

Application Category

📝 Abstract

LLMs when used with Retrieval Augmented Generation (RAG), are greatly improving the SOTA of translating natural language queries to structured and correct SQL. Unlike previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches that use (RAG) systems. We discuss benchmarks, evaluation methods, and evaluation metrics. Also, we uniquely study the use of Graph RAGs for better contextual accuracy and schema linking in these systems. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy toward improvements of LLM-based text-to-SQL systems.

Problem

Research questions and friction points this paper is trying to address.

Improving SQL translation accuracy

Exploring LLM evolution in SQL systems

Addressing computational and privacy challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM with RAG

Graph RAGs

Benchmarks evaluation methods

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks