Data-aware candidate selection in NL2SQL translation via small separating instances

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

This work addresses the challenge of selecting the correct SQL query in natural language to SQL (NL2SQL) tasks, where the number of candidate queries is limited and consistency-based scoring is unavailable. The authors propose a data-aware candidate selection method that, for the first time, integrates small-scale disambiguated instances with data provenance to construct an efficient filtering mechanism independent of consistency scores. By analyzing the semantic relationship between the input question and the underlying database schema, the approach accurately ranks a small set of candidate SQL queries. Experimental results on the BIRD-DEV subset demonstrate that the proposed method significantly outperforms three strong baselines, particularly when only two to three candidates are provided, thereby substantially improving the accuracy of NL2SQL systems.

📝 Abstract

We propose a data-aware candidate selection method for NL2SQL translation based on separating instances and provenance. We implement this approach and evaluate it against three natural baselines on a subset of BIRD-DEV. Experiments show that our method significantly outperforms baselines when only two or three candidates are given and no consistency score is available. The code of our prototype can be found at https://github.com/staskikotx/SISelection

Problem

Research questions and friction points this paper is trying to address.

NL2SQL

candidate selection

data-aware

separating instances

provenance

Innovation

Methods, ideas, or system contributions that make the work stand out.

data-aware candidate selection

NL2SQL

separating instances