🤖 AI Summary
Cross-database SQL dialect translation faces challenges of syntactic heterogeneity and semantic shift; existing approaches—rule-based systems and direct LLM generation—suffer from high maintenance overhead, elevated error rates on complex queries, and unreliable function mapping and semantic equivalence. This paper proposes a hybrid translation framework that synergistically integrates rule-based methods and large language models (LLMs). It introduces three key innovations: (1) a functionality-driven query segmentation mechanism, (2) a cross-dialect syntactic embedding model, and (3) an adaptive local-to-global translation strategy. By unifying syntax-aware parsing, LLM fine-tuning, modular query decomposition, and multi-level semantic validation, the framework ensures both syntactic correctness and semantic equivalence. It supports mainstream dialect translations (e.g., PostgreSQL → MySQL) and offers interfaces via a web console, CLI, and PyPI package. Experiments demonstrate a 62% reduction in error rate on complex queries, significantly improving robustness and translation accuracy.
📝 Abstract
Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic discrepancies and subtle semantic variations. Existing approaches including manual rewriting, rule-based systems, and large language model (LLM)-based techniques often involve high maintenance effort (e.g., crafting custom translation rules) or produce unreliable results (e.g., LLM generates non-existent functions), especially when handling complex queries. In this demonstration, we present CrackSQL, the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome these limitations. CrackSQL leverages the adaptability of LLMs to minimize manual intervention, while enhancing translation accuracy by segmenting lengthy complex SQL via functionality-based query processing. To further improve robustness, it incorporates a novel cross-dialect syntax embedding model for precise syntax alignment, as well as an adaptive local-to-global translation strategy that effectively resolves interdependent query operations. CrackSQL supports three translation modes and offers multiple deployment and access options including a web console interface, a PyPI package, and a command-line prompt, facilitating adoption across a variety of real-world use cases