🤖 AI Summary
To address the problem in text-to-SQL where large language models (LLMs) generate syntactically correct but semantically inaccurate SQL queries—misaligned with user intent—this paper proposes a consistency-enhanced multi-agent collaboration framework. The framework comprises three specialized agents: (1) SQLReviewer, the first to adopt “rubber-duck debugging” for self-explanatory reasoning to detect semantic inconsistencies; (2) QueryCrafter, fine-tuned on SQLTool to improve query intent modeling; and (3) SQLRefiner, integrating failure-memory retrospection and similarity-based repair retrieval for precise SQL rewriting. Evaluated on five mainstream benchmarks—including BIRD—the method achieves an average 3.0%+ improvement in execution accuracy while significantly reducing token consumption, outperforming existing state-of-the-art approaches.
📝 Abstract
While fine-tuned large language models (LLMs) excel in generating grammatically valid SQL in Text-to-SQL parsing, they often struggle to ensure semantic accuracy in queries, leading to user confusion and diminished system usability. To tackle this challenge, we introduce SQLFixAgent, a new consistency-enhanced multi-agent collaborative framework designed for detecting and repairing erroneous SQL. Our framework comprises a core agent, SQLRefiner, alongside two auxiliary agents: SQLReviewer and QueryCrafter. The SQLReviewer agent employs the rubber duck debugging method to identify potential semantic mismatches between SQL and user query. If the error is detected, the QueryCrafter agent generates multiple SQL as candidate repairs using a fine-tuned SQLTool. Subsequently, leveraging similar repair retrieval and failure memory reflection, the SQLRefiner agent selects the most fitting SQL statement from the candidates as the final repair. We evaluated our proposed framework on five Text-to-SQL benchmarks. The experimental results show that our method consistently enhances the performance of the baseline model, specifically achieving an execution accuracy improvement of over 3% on the Bird benchmark. Our framework also has a higher token efficiency compared to other advanced methods, making it more competitive.