Data-aware candidate selection in NL2SQL translation via small separating instances

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the challenge of selecting the correct SQL query in natural language to SQL (NL2SQL) tasks, where the number of candidate queries is limited and consistency-based scoring is unavailable. The authors propose a data-aware candidate selection method that, for the first time, integrates small-scale disambiguated instances with data provenance to construct an efficient filtering mechanism independent of consistency scores. By analyzing the semantic relationship between the input question and the underlying database schema, the approach accurately ranks a small set of candidate SQL queries. Experimental results on the BIRD-DEV subset demonstrate that the proposed method significantly outperforms three strong baselines, particularly when only two to three candidates are provided, thereby substantially improving the accuracy of NL2SQL systems.
📝 Abstract
We propose a data-aware candidate selection method for NL2SQL translation based on separating instances and provenance. We implement this approach and evaluate it against three natural baselines on a subset of BIRD-DEV. Experiments show that our method significantly outperforms baselines when only two or three candidates are given and no consistency score is available. The code of our prototype can be found at https://github.com/staskikotx/SISelection
Problem

Research questions and friction points this paper is trying to address.

NL2SQL
candidate selection
data-aware
separating instances
provenance
Innovation

Methods, ideas, or system contributions that make the work stand out.

data-aware candidate selection
NL2SQL
separating instances
provenance
BIRD-DEV