A Query Optimization Method Utilizing Large Language Models

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Traditional query optimizers, constrained by heuristic search strategies and inaccurate cost models, struggle to produce optimal execution plans in complex search spaces. This paper introduces LLMOpt—the first end-to-end, non-heuristic, LLM-driven query optimization framework—comprising two synergistic modules: LLMOpt(G), which leverages large language models to directly generate diverse, high-quality execution plans; and LLMOpt(S), which employs learning-to-rank over plan lists to identify the globally optimal plan. By integrating offline fine-tuning, prompt engineering, and semantic-aware ranking modeling, LLMOpt achieves transferable, semantics-informed optimization. Evaluated on the JOB, JOB-EXT, and Stack benchmarks, LLMOpt significantly outperforms PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) attains state-of-the-art plan quality while maintaining competitive inference efficiency, demonstrating strong practical deployability.

Technology Category

Application Category

📝 Abstract

Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.

Problem

Research questions and friction points this paper is trying to address.

Optimizes query execution strategies in databases

Addresses complexity and inaccuracies in traditional methods

Uses LLMs for plan generation and selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Large Language Models for query optimization

Generates query plans using LLM reasoning abilities

Enhances selection accuracy with a list-wise cost model

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks