🤖 AI Summary
Traditional query optimizers, constrained by heuristic search strategies and inaccurate cost models, struggle to produce optimal execution plans in complex search spaces. This paper introduces LLMOpt—the first end-to-end, non-heuristic, LLM-driven query optimization framework—comprising two synergistic modules: LLMOpt(G), which leverages large language models to directly generate diverse, high-quality execution plans; and LLMOpt(S), which employs learning-to-rank over plan lists to identify the globally optimal plan. By integrating offline fine-tuning, prompt engineering, and semantic-aware ranking modeling, LLMOpt achieves transferable, semantics-informed optimization. Evaluated on the JOB, JOB-EXT, and Stack benchmarks, LLMOpt significantly outperforms PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) attains state-of-the-art plan quality while maintaining competitive inference efficiency, demonstrating strong practical deployability.
📝 Abstract
Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.