A Query Optimization Method Utilizing Large Language Models

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional query optimizers, constrained by heuristic search strategies and inaccurate cost models, struggle to produce optimal execution plans in complex search spaces. This paper introduces LLMOpt—the first end-to-end, non-heuristic, LLM-driven query optimization framework—comprising two synergistic modules: LLMOpt(G), which leverages large language models to directly generate diverse, high-quality execution plans; and LLMOpt(S), which employs learning-to-rank over plan lists to identify the globally optimal plan. By integrating offline fine-tuning, prompt engineering, and semantic-aware ranking modeling, LLMOpt achieves transferable, semantics-informed optimization. Evaluated on the JOB, JOB-EXT, and Stack benchmarks, LLMOpt significantly outperforms PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) attains state-of-the-art plan quality while maintaining competitive inference efficiency, demonstrating strong practical deployability.

Technology Category

Application Category

📝 Abstract
Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.
Problem

Research questions and friction points this paper is trying to address.

Optimizes query execution strategies in databases
Addresses complexity and inaccuracies in traditional methods
Uses LLMs for plan generation and selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Large Language Models for query optimization
Generates query plans using LLM reasoning abilities
Enhances selection accuracy with a list-wise cost model
🔎 Similar Papers
No similar papers found.
Z
Zhiming Yao
1School of Information, Renmin University of China, Beijing, China, 2Key Laboratory of Data Engineering and Knowledge Engineering, MOE, China
H
Haoyang Li
1School of Information, Renmin University of China, Beijing, China, 2Key Laboratory of Data Engineering and Knowledge Engineering, MOE, China
J
Jing Zhang
1School of Information, Renmin University of China, Beijing, China, 3Engineering Research Center of Database and Business Intelligence, MOE, China
Cuiping Li
Cuiping Li
Renmin University of China
Databasebig data analysis and mining
H
Hong Chen
1School of Information, Renmin University of China, Beijing, China, 3Engineering Research Center of Database and Business Intelligence, MOE, China