Are LLMs Overkill for Databases?: A Study on the Finiteness of SQL

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This study investigates the necessity of large language models (LLMs) for natural language to SQL (NL2SQL) tasks and challenges the prevailing overestimation of SQL query complexity. Through empirical analysis of 376 real-world databases, the work reveals— for the first time—that SQL query templates follow a power-law-like distribution: merely 13% of distinct templates account for 70% of all queries. Leveraging large-scale database sampling, SQL template abstraction, frequency statistics, and natural language–SQL alignment techniques, the authors demonstrate that query complexity does not monotonically increase with the number of tables and that the majority of queries are highly predictable. These findings underscore the advantages of template-based approaches in terms of security, cost-efficiency, and auditability, thereby questioning the dominant paradigm that relies on complex LLMs for NL2SQL translation.

Technology Category

Application Category

📝 Abstract

Translating natural language to SQL for data retrieval has become more accessible thanks to code generation LLMs. But how hard is it to generate SQL code? While databases can become unbounded in complexity, the complexity of queries is bounded by real life utility and human needs. With a sample of 376 databases, we show that SQL queries, as translations of natural language questions are finite in practical complexity. There is no clear monotonic relationship between increases in database table count and increases in complexity of SQL queries. In their template forms, SQL queries follow a Power Law-like distribution of frequency where 70% of our tested queries can be covered with just 13% of all template types, indicating that the high majority of SQL queries are predictable. This suggests that while LLMs for code generation can be useful, in the domain of database access, they may be operating in a narrow, highly formulaic space where templates could be safer, cheaper, and auditable.

Problem

Research questions and friction points this paper is trying to address.

LLMs

SQL generation

query complexity

natural language to SQL

finiteness

Innovation

Methods, ideas, or system contributions that make the work stand out.

SQL query templates

natural language to SQL

code generation LLMs