🤖 AI Summary
This work addresses the semantic ambiguity and missing constraint issues in Text-to-SQL systems when handling complex queries, which often stem from insufficient contextual understanding. To tackle these challenges, the paper proposes PV-SQL, a novel framework that integrates active database probing with rule-driven verification. The Probe component iteratively generates exploratory queries to dynamically retrieve database records, thereby clarifying value formats, column semantics, and inter-table relationships. Concurrently, the Verify component extracts verifiable conditions to construct an executable checklist, enabling iterative refinement of the generated SQL. Evaluated on the BIRD benchmark, PV-SQL improves execution accuracy by 5% and effective efficiency score by 20.8%, while substantially reducing the number of tokens required during inference.
📝 Abstract
Text-to-SQL systems often struggle with deep contextual understanding, particularly for complex queries with subtle requirements. We present PV-SQL, an agentic framework that addresses these failures through two complementary components: Probe and Verify. The Probe component iteratively generates probing queries to retrieve concrete records from the database, resolving ambiguities in value formats, column semantics, and inter-table relationships to build richer contextual understanding. The Verify component employs a rule-based method to extract verifiable conditions and construct an executable checklist, enabling iterative SQL refinement that effectively reduces missing constraints. Experiments on the BIRD benchmarks show that PV-SQL outperforms the best text-to-SQL baseline by 5% in execution accuracy and 20.8% in valid efficiency score while consuming fewer tokens.