🤖 AI Summary
This work addresses the challenge of selecting the correct SQL query in natural language to SQL (NL2SQL) tasks, where the number of candidate queries is limited and consistency-based scoring is unavailable. The authors propose a data-aware candidate selection method that, for the first time, integrates small-scale disambiguated instances with data provenance to construct an efficient filtering mechanism independent of consistency scores. By analyzing the semantic relationship between the input question and the underlying database schema, the approach accurately ranks a small set of candidate SQL queries. Experimental results on the BIRD-DEV subset demonstrate that the proposed method significantly outperforms three strong baselines, particularly when only two to three candidates are provided, thereby substantially improving the accuracy of NL2SQL systems.
📝 Abstract
We propose a data-aware candidate selection method for NL2SQL translation based on separating instances and provenance. We implement this approach and evaluate it against three natural baselines on a subset of BIRD-DEV. Experiments show that our method significantly outperforms baselines when only two or three candidates are given and no consistency score is available. The code of our prototype can be found at https://github.com/staskikotx/SISelection