🤖 AI Summary
This work addresses the limitations of existing Text-to-SQL approaches in handling deeply nested or multi-clause queries and their lack of progressive reasoning capabilities. The authors propose a coarse-to-fine tree search framework that models SQL skeleton prediction as a hierarchical exploration process. By leveraging a three-level skeleton hierarchy, a collaborative mechanism between generation and evaluation agents, large language models, multi-candidate generation, and pruning strategies, the method effectively balances structural diversity with search efficiency. Evaluated on the hidden test set of the BIRD benchmark, the approach achieves an execution accuracy of 71.6%, significantly outperforming current search-based and skeleton-based methods and demonstrating substantial improvements in generating complex SQL queries.
📝 Abstract
Text-to-SQL translates natural language questions into executable SQL queries, enabling intuitive database access for non-experts. While large language models achieve strong performance on Text-to-SQL with prompting, they still struggle with complex queries that involve deeply nested logic or multiple clauses. A widely used approach employs SQL skeletons--intermediate representations of query logic--to streamline generation, but existing methods are limited by their reliance on a single structural hypothesis and lack of progressive reasoning. To overcome these limitations, we propose LEAF-SQL, a novel framework that reframes skeleton prediction as a coarse-to-fine tree search process. LEAF-SQL enables systematic exploration of diverse structural hypotheses with adaptive refinement. Several key techniques are employed in LEAF-SQL: (1) a three-level skeleton hierarchy to guide the search, (2) a Skeleton Formulation Agent to generate diverse candidates, and (3) a Skeleton Evaluation Agent to efficiently prune the search space. This integrated design yields skeleton candidates that are both structurally diverse and granularity-adaptive, providing a stronger foundation for the SQL generation. Extensive experiments show that LEAF-SQL consistently improves the performance of various LLM backbones. On the official hidden test set of the challenging BIRD benchmark, our method achieves 71.6 execution accuracy, which outperforms leading search-based and skeleton-based methods, affirming its effectiveness for complex queries.