🤖 AI Summary
Large language models (LLMs) struggle to accurately model the alignment between user intent and database schema in natural language-to-SQL (NL2SQL) translation, leading to high error rates. To address this, we propose SGU-SQL—a novel framework introducing a decoupled, stepwise generation paradigm that separately handles schema grounding and syntactic tree decomposition. SGU-SQL integrates structure-enhanced query-schema alignment, syntax-tree-guided LLM decoding, and multi-stage structure-aware prompting and fine-tuning to achieve precise semantic-to-syntactic mapping. Evaluated on two prominent benchmarks—Spider and BIRD—SGU-SQL outperforms 16 state-of-the-art baselines, achieving new SOTA performance in both execution accuracy and cross-domain generalization. Our results empirically validate that explicit structural modeling is critical for advancing NL2SQL systems.
📝 Abstract
Generating accurate Structured Querying Language (SQL) is a long-standing problem, especially in matching users' semantic queries with structured databases and then generating structured SQL. Existing models typically input queries and database schemas into the LLM and rely on the LLM to perform semantic-structure matching and generate structured SQL. However, such solutions overlook the structural information within user queries and databases, which can be utilized to enhance the generation of structured SQL. This oversight can lead to inaccurate or unexecutable SQL generation. To fully exploit the structure, we propose a structure-to-SQL framework, which leverages the inherent structure information to improve the SQL generation of LLMs. Specifically, we introduce our Structure Guided SQL~(SGU-SQL) generation model. SGU-SQL first links user queries and databases in a structure-enhanced manner. It then decomposes complicated linked structures with grammar trees to guide the LLM to generate the SQL step by step. Extensive experiments on two benchmark datasets illustrate that SGU-SQL can outperform sixteen SQL generation baselines.