RISE: Rule-Driven SQL Dialect Translation via Query Reduction

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of efficiency and accuracy in translating complex SQL dialects during cross-relational-database migration. The authors propose RISE, a novel approach that synergistically combines dialect-aware query simplification with large language models (LLMs). Specifically, RISE first reduces the complexity of input queries through dimensionality reduction, then leverages an LLM to translate the simplified queries and automatically extract generalizable translation rules, which are subsequently applied back to the original complex queries. By avoiding direct processing of lengthy, intricate queries, RISE significantly enhances translation accuracy and generalization. Evaluated on the TPC-DS and SQLProcBench benchmarks, the method achieves accuracy rates of 97.98% and 100%, respectively—outperforming conventional rule-based tools and pure LLM approaches by average margins of 24.62% and 238.41%.

Technology Category

Application Category

📝 Abstract
Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation tools rely on manually-crafted rules, necessitating significant manual effort to support new RDBMSs and dialects. Although large language models (LLMs) can assist in translating SQL dialects, they often struggle with lengthy and complex SQL queries. In this paper, we propose RISE, a novel LLM-based SQL dialect translation approach that can accurately handle lengthy and complex SQL queries. Given a complex source query $Q_c$ that contains a SQL dialect $d$, we first employ a dialect-aware query reduction technique to derive a simplified query $Q_{s}$ by removing $d$-irrelevant SQL elements from $Q_c$. Subsequently, we utilize LLMs to translate $Q_{s}$ into $Q_{s^{'}}$, and automatically extract the translation rule $r_d$ for dialect $d$ based on the relationship between $Q_{s}$ and $Q_{s^{'}}$. By applying $r_d$ to $Q_c$, we can effectively translate the dialect $d$ within $Q_c$, thereby bypassing the complexity of the source query $Q_c$. We evaluate RISE on two real-world benchmarks, i.e., TPC-DS and SQLProcBench, comparing its performance against both the traditional rule-based tools and the LLM-based approaches with respect to translation accuracy. RISE achieves accuracies of 97.98% on TPC-DS and 100% on SQLProcBench, outperforming the baselines by an average improvement of 24.62% and 238.41%, respectively.
Problem

Research questions and friction points this paper is trying to address.

SQL dialect translation
relational database management systems
large language models
query complexity
cloud migration
Innovation

Methods, ideas, or system contributions that make the work stand out.

SQL dialect translation
query reduction
large language models
rule extraction
RDBMS migration
🔎 Similar Papers
No similar papers found.