🤖 AI Summary
Existing natural language to SQL (NL2SQL) approaches often struggle to generate semantically correct and executable queries in multi-dialect database environments due to dialect coupling, semantic degradation, or cross-dialect interference. This work proposes Dial, a novel framework that decouples intent understanding from dialect-specific syntax by introducing dialect-aware logical query planning, a hierarchical intent knowledge base (HINT-KB), and an execution-driven semantic validation loop. These components collectively enable precise translation from natural language to dialect-specific SQL. Evaluated on the newly constructed DS-NL2SQL benchmark, Dial outperforms state-of-the-art methods by 10.25% in translation accuracy and achieves a 15.77% improvement in dialect feature coverage.
📝 Abstract
Enterprises commonly deploy heterogeneous database systems, each of which owns a distinct SQL dialect with different syntax rules, built-in functions, and execution constraints. However, most existing NL2SQL methods assume a single dialect (e.g., SQLite) and struggle to produce queries that are both semantically correct and executable on target engines. Prompt-based approaches tightly couple intent reasoning with dialect syntax, rule-based translators often degrade native operators into generic constructs, and multi-dialect fine-tuning suffers from cross-dialect interference. In this paper, we present Dial, a knowledge-grounded framework for dialect-specific NL2SQL. Dial introduces: (1) a Dialect-Aware Logical Query Planning module that converts natural language into a dialect-aware logical query plan via operator-level intent decomposition and divergence-aware specification; (2) HINT-KB, a hierarchical intent-aware knowledge base that organizes dialect knowledge into (i) a canonical syntax reference, (ii) a declarative function repository, and (iii) a procedural constraint repository; and (3) an execution-driven debugging and semantic verification loop that separates syntactic recovery from logic auditing to prevent semantic drift. We construct DS-NL2SQL, a benchmark covering six major database systems with 2,218 dialect-specific test cases. Experimental results show that Dial consistently improves translation accuracy by 10.25% and dialect feature coverage by 15.77% over state-of-the-art baselines. The code is at https://github.com/weAIDB/Dial.