Query Rewriting via LLMs

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
SQL query rewriting faces a fundamental trade-off between performance optimization and interpretability, while remaining prone to semantic or syntactic errors. Method: This paper proposes an LLM-driven, database-aware rewriting framework featuring a novel token-probability-guided rewrite path selection mechanism; it integrates metadata-aware prompting, selectivity-aware rewriting rules, redundancy elimination, and dual verification—logical equivalence checking and statistical consistency validation. Contribution/Results: The framework bridges the gap between purely rule-based and purely LLM-based approaches, enabling robust end-to-end rewriting. Experiments on TPC-DS show that two-thirds of queries achieve >1.5× speedup; rewrite coverage reaches four times that of the state-of-the-art; and the geometric mean speedup improves by an order of magnitude. The framework has been integrated into the LITHE system and validated across mainstream database platforms.

Technology Category

Application Category

📝 Abstract
Query rewriting is a classical technique for transforming complex declarative SQL queries into ``lean'' equivalents that are conducive to (a) faster execution from a performance perspective, and (b) better understanding from a developer perspective. The rewriting is typically achieved via transformation rules, but these rules are limited in scope and difficult to update in a production system. In recent times, LLM-based techniques have also been mooted, but they are prone to both semantic and syntactic errors. We investigate here, how the remarkable cognitive capabilities of LLMs can be leveraged for performant query rewriting while incorporating safeguards and optimizations to ensure correctness and efficiency. Our study shows that these goals can be progressively achieved through incorporation of (a) an ensemble suite of basic prompts, (b) database-sensitive prompts via redundancy removal and selectivity-based rewriting rules, and (c) LLM token probability-guided rewrite paths. Further, a suite of statistical and logic-based tools can be used to guard against errors produced by the model. We have implemented the above LLM-infused techniques in the LITHE system, and evaluated complex analytic queries from multiple benchmarks on contemporary database platforms. The results show significant improvements over SOTA rewriting techniques -- for instance, on TPC-DS, LITHE constructed productive (>1.5x speedup) rewrites for emph{two-thirds} of the query suite, delivering four times more coverage than SOTA. Further, the geometric mean of its estimated execution speedups was an emph{order-of-magnitude} jump over SOTA performance. In essence, LITHE offers a potent and robust LLM-based intermediary between enterprise applications and database engines.
Problem

Research questions and friction points this paper is trying to address.

Leveraging LLMs for SQL query rewriting
Ensuring correctness and efficiency in rewrites
Improving performance over state-of-the-art techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM ensemble prompts for query rewriting
Database-sensitive redundancy removal techniques
Token probability-guided rewrite paths
🔎 Similar Papers
No similar papers found.
S
Sriram Dharwada
Indian Institute of Science
J
J. Haritsa
Indian Institute of Science
Harish Doraiswamy
Harish Doraiswamy
Microsoft Research India
Scientific VisualizationComputational TopologyGPU AlgorithmsDatabases