🤖 AI Summary
This work systematically evaluates seven state-of-the-art learned cost models (LCMs) across three core database query optimization tasks—join ordering, access path selection, and physical operator selection—and benchmarks them against traditional cost models. Using regression models including XGBoost, DNNs, and GNNs, coupled with fine-grained query feature engineering and a reproducible execution-performance evaluation framework, experiments are conducted across multiple benchmarks. Results reveal that although most LCMs achieve high prediction accuracy, their optimization quality and end-to-end query latency remain significantly inferior to those of classical models. This study is the first to identify and formalize the critical “accuracy–optimality gap”—the dissociation between prediction fidelity and optimization effectiveness—attributing it to error propagation, objective misalignment, and poor generalization. It proposes a standardized evaluation protocol and empirically validated, reproducible findings, thereby offering foundational insights and concrete directions for advancing the practical deployment of LCMs.
📝 Abstract
Traditionally, query optimizers rely on cost models to choose the best execution plan from several candidates, making precise cost estimates critical for efficient query execution. In recent years, cost models based on machine learning have been proposed to overcome the weaknesses of traditional cost models. While these models have been shown to provide better prediction accuracy, only limited efforts have been made to investigate how well Learned Cost Models (LCMs) actually perform in query optimization and how they affect overall query performance. In this paper, we address this by a systematic study evaluating LCMs on three of the core query optimization tasks: join ordering, access path selection, and physical operator selection. In our study, we compare seven state-of-the-art LCMs to a traditional cost model and, surprisingly, find that the traditional model often still outperforms LCMs in these tasks. We conclude by highlighting major takeaways and recommendations to guide future research toward making LCMs more effective for query optimization.