Query Rewriting via Large Language Models

πŸ“… 2024-03-14
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 18
✨ Influential: 3
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of poor generalizability and verifiability in low-quality SQL query rewriting, this paper proposes GenRewriteβ€”the first end-to-end LLM-driven query rewriting system. Methodologically, it introduces (1) natural-language rewriting rules (NLR2s) for knowledge representation and cross-query-pattern transfer; (2) a counterexample-guided iterative correction framework that jointly ensures semantic correctness and execution efficiency; and (3) tight integration of SQL syntactic/semantic constraints with LLM reasoning. Evaluated on 99 complex queries from the TPC benchmarks, GenRewrite achieves >2Γ— speedup on 22 queries, improves rewriting coverage by 2.5–3.2Γ— over conventional methods, and outperforms zero-shot LLM baselines by 2.1Γ—.

Technology Category

Application Category

πŸ“ Abstract
Query rewriting is one of the most effective techniques for coping with poorly written queries before passing them down to the query optimizer. Manual rewriting is not scalable, as it is error-prone and requires deep expertise. Similarly, traditional query rewriting algorithms can only handle a small subset of queries: rule-based techniques do not generalize to new query patterns and synthesis-based techniques cannot handle complex queries. Fortunately, the rise of Large Language Models (LLMs), equipped with broad general knowledge and advanced reasoning capabilities, has created hopes for solving some of these previously open problems. In this paper, we present GenRewrite, the first holistic system that leverages LLMs for query rewriting. We introduce the notion of Natural Language Rewrite Rules (NLR2s), and use them as hints to the LLM but also a means for transferring knowledge from rewriting one query to another, and thus becoming smarter and more effective over time. We present a novel counterexample-guided technique that iteratively corrects the syntactic and semantic errors in the rewritten query, significantly reducing the LLM costs and the manual effort required for verification. GenRewrite speeds up 22 out of 99 TPC queries (the most complex public benchmark) by more than 2x, which is 2.5x--3.2x higher coverage than state-of-the-art traditional query rewriting and 2.1x higher than the out-of-the-box LLM baseline.
Problem

Research questions and friction points this paper is trying to address.

Automating query rewriting to replace manual and rule-based methods
Addressing limitations of traditional algorithms with complex query patterns
Reducing syntactic and semantic errors in rewritten queries efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging Large Language Models for query rewriting
Introducing Natural Language Rewrite Rules for knowledge transfer
Using counterexample-guided technique to correct query errors
πŸ”Ž Similar Papers
No similar papers found.
J
Jie Liu
University of Michigan, Ann Arbor, Michigan, USA
Barzan Mozafari
Barzan Mozafari
University of Michigan, Ann Arbor, Michigan, USA