MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Text-to-SQL approaches rely on static execution feedback and lack real-time error correction capability. To address this, we propose a multi-round tool-integrated reasoning reinforcement learning framework that jointly incorporates dynamic database interaction and context-aware progressive query refinement, introducing an execution-aware multi-turn reasoning paradigm. We enhance the GRPO algorithm by removing the KL-divergence constraint and designing a trajectory filtering mechanism to improve training stability and policy distribution consistency. Our method achieves 64.4% and 84.6% execution accuracy on BIRD Dev and SPIDER Dev, respectively—substantially outperforming prior state-of-the-art methods. The core contribution lies in being the first to deeply integrate dynamic execution feedback into end-to-end differentiable, error-correctable multi-round RL training for Text-to-SQL generation.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves extbf{64.4}% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Text-to-SQL with multi-turn tool integration
Addressing static execution feedback limitations via dynamic reasoning
Improving reinforcement learning stability for complex database queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn tool-integrated reasoning with dynamic feedback
Execution-aware paradigm for progressive query refinement
Enhanced GRPO algorithm with trajectory filtering mechanism
🔎 Similar Papers
No similar papers found.
Zekun Xu
Zekun Xu
Amazon
Machine LearningStatistical Model
S
Siyu Xia
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Chuhuai Yue
Chuhuai Yue
Beijing Institute of Technology, Researcher in Meituan
VLMLLM post training
Jiajun Chai
Jiajun Chai
Meituan Inc.
Reinforcement LearningLLMsAgentic Learning
M
Mingxue Tian
Shanghai Jiao Tong University
X
Xiaohan Wang
Meituan
W
Wei Lin
Meituan
H
Haoxuan Li
Peking University
Guojun Yin
Guojun Yin
Meituan, University of Science and Technology of China
MultimodalityComputer VisionFoundation ModelsDeep LearningImage/Video Processing