A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the challenges of deploying large language models (LLMs) in enterprise private environments—due to privacy concerns and high computational costs—and the poor generalization of small language models (SLMs) on Text-to-SQL tasks, this paper proposes a lightweight multi-agent SQL generation framework. Our approach introduces the first SLM-oriented, multi-role agent collaboration mechanism, integrated with an execution-feedback-driven reinforcement learning fine-tuning paradigm to enable dynamic error correction and collaborative optimization during inference. Evaluated on standard benchmarks such as Spider, our method achieves accuracy comparable to mainstream LLMs (e.g., Codex, GPT-3.5), while reducing parameter count by over 90% and enabling efficient single-GPU deployment. Key contributions include: (i) the first execution-feedback-enhanced multi-agent architecture specifically designed for SLMs; and (ii) an end-to-end Text-to-SQL solution that simultaneously ensures privacy compliance, low resource consumption, and high accuracy.

Technology Category

Application Category

📝 Abstract

Text2SQL, the task of generating SQL queries from natural language text, is a critical challenge in data engineering. Recently, Large Language Models (LLMs) have demonstrated superior performance for this task due to their advanced comprehension and generation capabilities. However, privacy and cost considerations prevent companies from using Text2SQL solutions based on external LLMs offered as a service. Rather, small LLMs (SLMs) that are openly available and can hosted in-house are adopted. These SLMs, in turn, lack the generalization capabilities of larger LLMs, which impairs their effectiveness for complex tasks such as Text2SQL. To address these limitations, we propose MATS, a novel Text2SQL framework designed specifically for SLMs. MATS uses a multi-agent mechanism that assigns specialized roles to auxiliary agents, reducing individual workloads and fostering interaction. A training scheme based on reinforcement learning aligns these agents using feedback obtained during execution, thereby maintaining competitive performance despite a limited LLM size. Evaluation results using on benchmark datasets show that MATS, deployed on a single- GPU server, yields accuracy that are on-par with large-scale LLMs when using significantly fewer parameters. Our source code and data are available at https://github.com/thanhdath/mats-sql.

Problem

Research questions and friction points this paper is trying to address.

Develops a privacy-preserving Text2SQL framework for small language models

Enhances SQL generation accuracy using multi-agent reinforcement learning

Achieves competitive performance with fewer parameters than large models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework with specialized roles for small language models

Reinforcement learning training using execution feedback for alignment

Single-GPU deployment achieving accuracy comparable to large-scale models

🔎 Similar Papers

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration