SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement

📅 2024-08-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing Text-to-SQL approaches suffer from high computational cost, reliance on large language models (LLMs) for post-hoc correction, and poor executability. To address these limitations, this work proposes a semantic-enhanced and adaptively fine-tuned framework. First, it constructs a semantically enriched database schema representation to improve schema understanding. Second, it introduces a novel zero-shot adaptive bias elimination mechanism that requires no hand-crafted rules or GPT-4 intervention. Third, it incorporates execution-guided dynamic SQL rewriting to enable end-to-end executable query generation. Evaluated on the Spider and BIRD benchmarks, the method achieves state-of-the-art performance under the GPT-3.5 setting, with generation costs仅为 0.9%–5.3% of those incurred by GPT-4 and computational overhead reduced by 9%–58% relative to baseline methods. The framework thus significantly balances efficiency, accuracy, and deployment feasibility.

Technology Category

Application Category

📝 Abstract

Recent advancements in large language models (LLMs) have significantly contributed to the progress of the Text-to-SQL task. A common requirement in many of these works is the post-correction of SQL queries. However, the majority of this process entails analyzing error cases to develop prompts with rules that eliminate model bias. And there is an absence of execution verification for SQL queries. In addition, the prevalent techniques primarily depend on GPT-4 and few-shot prompts, resulting in expensive costs. To investigate the effective methods for SQL refinement in a cost-efficient manner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement (SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution Adjustment, aims to improve performance while minimizing resource expenditure with zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced schema to augment database information and optimize SQL queries. During the SQL query generation, a fine-tuned adaptive bias eliminator is applied to mitigate inherent biases caused by the LLM. The dynamic execution adjustment is utilized to guarantee the executability of the bias eliminated SQL query. We conduct experiments on the Spider and BIRD datasets to demonstrate the effectiveness of this framework. The results demonstrate that SEA-SQL achieves state-of-the-art performance in the GPT3.5 scenario with 9%-58% of the generation cost. Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the generation cost.

Problem

Research questions and friction points this paper is trying to address.

Natural Language Processing

SQL Generation

Model Bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-Shot Learning

Cost-Efficient SQL Generation

Adaptive Bias Mitigation

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks