🤖 AI Summary
Traditional cost-based query optimizers frequently produce suboptimal execution plans due to heuristic rules and inaccurate cost models. Existing learned query optimizers (LQOs) rely on pairwise ranking, suffering from ranking inconsistency and poor generalization. This paper proposes CARPO, a context-aware listwise learning-to-rank framework built upon Transformer architectures, which models plan evaluation as a holistic ordinal decision problem. To enhance robustness, CARPO integrates out-of-distribution (OOD) detection with a top-k fallback mechanism. Evaluated on TPC-H, CARPO achieves a Top-1 accuracy of 74.54%—significantly surpassing Lero’s 3.63%—and reduces total query execution time to 3719.16 ms, an 83.6% improvement over PostgreSQL. These results demonstrate CARPO’s superior accuracy, ranking consistency, and deployment reliability.
📝 Abstract
Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-$k$ fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of extbf{74.54%} on the TPC-H benchmark compared to Lero's 3.63%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.