🤖 AI Summary
Current Text-to-SQL methods suffer from insufficient system-level reliability, primarily due to the absence of structured, verifiable workflow orchestration. This paper proposes a paradigm shift: modeling Text-to-SQL as a software engineering problem and designing an SDLC-inspired framework for reliable SQL generation. The framework comprises semantic alignment, N-version parallel generation, toolchain-driven deterministic verification, and confidence-aware pairwise arbitration. It introduces the first end-to-end architecture integrating semantic-value retrieval, robust schema linking, multi-path reasoning, LLM-guided correction, and unit-test-based validation. Without fine-tuning and using only an open-source ~30B LLM, it achieves 73.5% execution accuracy on BIRD-Dev and 89.8% on Spider-Test—substantially surpassing state-of-the-art methods. These results validate the effectiveness and scalability of the software engineering paradigm for complex semantic parsing tasks.
📝 Abstract
Large language models (LLMs) have advanced Text-to-SQL, yet existing solutions still fall short of system-level reliability. The limitation is not merely in individual modules - e.g., schema linking, reasoning, and verification - but more critically in the lack of structured orchestration that enforces correctness across the entire workflow. This gap motivates a paradigm shift: treating Text-to-SQL not as free-form language generation but as a software-engineering problem that demands structured, verifiable orchestration. We present DeepEye-SQL, a software-engineering-inspired framework that reframes Text-to-SQL as the development of a small software program, executed through a verifiable process guided by the Software Development Life Cycle (SDLC). DeepEye-SQL integrates four synergistic stages: it grounds ambiguous user intent through semantic value retrieval and robust schema linking; enhances fault tolerance with N-version SQL generation using diverse reasoning paradigms; ensures deterministic verification via a tool-chain of unit tests and targeted LLM-guided revision; and introduces confidence-aware selection that clusters execution results to estimate confidence and then takes a high-confidence shortcut or runs unbalanced pairwise adjudication in low-confidence cases, yielding a calibrated, quality-gated output. This SDLC-aligned workflow transforms ad hoc query generation into a disciplined engineering process. Using ~30B open-source LLMs without any fine-tuning, DeepEye-SQL achieves 73.5% execution accuracy on BIRD-Dev and 89.8% on Spider-Test, outperforming state-of-the-art solutions. This highlights that principled orchestration, rather than LLM scaling alone, is key to achieving system-level reliability in Text-to-SQL.