MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Existing Text-to-SQL approaches rely on static execution feedback and lack real-time error correction capability. To address this, we propose a multi-round tool-integrated reasoning reinforcement learning framework that jointly incorporates dynamic database interaction and context-aware progressive query refinement, introducing an execution-aware multi-turn reasoning paradigm. We enhance the GRPO algorithm by removing the KL-divergence constraint and designing a trajectory filtering mechanism to improve training stability and policy distribution consistency. Our method achieves 64.4% and 84.6% execution accuracy on BIRD Dev and SPIDER Dev, respectively—substantially outperforming prior state-of-the-art methods. The core contribution lies in being the first to deeply integrate dynamic execution feedback into end-to-end differentiable, error-correctable multi-round RL training for Text-to-SQL generation.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves extbf{64.4}% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Enhancing Text-to-SQL with multi-turn tool integration

Addressing static execution feedback limitations via dynamic reasoning

Improving reinforcement learning stability for complex database queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn tool-integrated reasoning with dynamic feedback

Execution-aware paradigm for progressive query refinement

Enhanced GRPO algorithm with trajectory filtering mechanism

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks