Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical limitation of natural language to SQL (NL2SQL) systems in reliably rejecting out-of-domain or semantically ambiguous queries due to insufficient boundary awareness. To this end, we propose BAR-SQL, a novel framework that constructs enterprise-scale corpora enriched with boundary cases through seed mutation synthesis, and integrates knowledge-anchored reasoning to generate interpretable chains of thought. BAR-SQL employs a two-stage training strategy—supervised fine-tuning followed by Grouped Relative Policy Optimization (GRPO)—augmented with a task-conditioned hybrid reward mechanism that jointly optimizes SQL execution accuracy and rejection semantic precision. Our approach pioneers the explicit incorporation of boundary awareness into the NL2SQL generation pipeline and introduces Ent-SQL-Bench, the first benchmark evaluating both generation and rejection capabilities. On this benchmark, BAR-SQL achieves a state-of-the-art average accuracy of 91.48%, outperforming advanced closed-source models such as Claude 4.5 Sonnet and GPT-5.

Technology Category

Application Category

📝 Abstract
In this paper, we present BAR-SQL (Boundary-Aware Reliable NL2SQL), a unified training framework that embeds reliability and boundary awareness directly into the generation process. We introduce a Seed Mutation data synthesis paradigm that constructs a representative enterprise corpus, explicitly encompassing multi-step analytical queries alongside boundary cases including ambiguity and schema limitations. To ensure interpretability, we employ Knowledge-Grounded Reasoning Synthesis, which produces Chain-of-Thought traces explicitly anchored in schema metadata and business rules. The model is trained through a two-stage process: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning via Group Relative Policy Optimization. We design a Task-Conditioned Hybrid Reward mechanism that simultaneously optimizes SQL execution accuracy-leveraging Abstract Syntax Tree analysis and dense result matching-and semantic precision in abstention responses. To evaluate reliability alongside generation accuracy, we construct and release Ent-SQL-Bench, which jointly assesse SQL precision and boundary-aware abstention across ambiguous and unanswerable queries. Experimental results on this benchmark demonstrate that BAR-SQL achieves 91.48% average accuracy, outperforming leading proprietary models, including Claude 4.5 Sonnet and GPT-5, in both SQL generation quality and boundary-aware abstention capability. The source code and benchmark are available anonymously at: https://github.com/TianSongS/BAR-SQL.
Problem

Research questions and friction points this paper is trying to address.

NL2SQL
boundary awareness
reliability
abstention
ambiguous queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Boundary-Aware NL2SQL
Hybrid Reward
Data Synthesis
Knowledge-Grounded Reasoning
Reliable Abstention
S
Songsong Tian
Li Auto Inc., Beijing, China
K
Kongsheng Zhuo
Li Auto Inc., Beijing, China
Zhendong Wang
Zhendong Wang
University of Science and Technology of China (USTC)
Computer VisionDeep LearningGenerative ModelAIGC
R
Rong Shen
Li Auto Inc., Beijing, China
S
Shengtao Zhang
Li Auto Inc., Beijing, China
Y
Yong Wu
Li Auto Inc., Beijing, China