XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

To address insufficient semantic parsing accuracy in Text-to-SQL tasks, this paper proposes a multi-generator collaborative framework. First, multiple SQL generators—diverse in output format and jointly fine-tuned on schema-aware and text-SQL alignment objectives—are constructed to enhance candidate diversity. Second, schema-guided filtering and structured candidate reorganization improve semantic consistency. Finally, a lightweight selection model discriminates the optimal SQL query. Evaluated on BIRD, the method achieves 75.63% accuracy (state-of-the-art), and 89.65% on Spider’s test set—substantially outperforming existing single-generator and ensemble approaches. Key contributions include: (1) multi-format fine-tuning for controllable generation diversity; (2) a structured candidate reorganization mechanism enforcing syntactic and semantic coherence; and (3) a low-overhead selection optimization paradigm that avoids costly end-to-end retraining while preserving performance.

Technology Category

Application Category

📝 Abstract

To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple highquality and diverse SQL queries; 3) a selection model with a candidate reorganization strategy implemented to obtain the optimal SQL query. Specifically, for the multi-generator ensemble, we employ a multi-task fine-tuning strategy to enhance the capabilities of SQL generation models for the intrinsic alignment between SQL and text, and construct multiple generation models with distinct generation styles by fine-tuning across different SQL formats. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Overall, XiYan-SQL achieves a new SOTA performance of 75.63% on the notable BIRD benchmark, surpassing all previous methods. It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.

Problem

Research questions and friction points this paper is trying to address.

Develop a multi-generator framework for Text-to-SQL conversion

Generate diverse high-quality SQL queries from text input

Select optimal SQL query using reorganization strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Schema Filter module for relevant schemas

Multi-generator ensemble for diverse SQL queries

Selection model with candidate reorganization strategy

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks