Reqo: A Robust and Explainable Query Optimization Cost Model

📅 2025-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in learned query optimizers— inaccurate cost estimation, poor robustness, and lack of decision interpretability—this paper proposes a robust and interpretable tree-structured cost model. Methodologically: (1) it introduces the first Bi-GNN+GRU hybrid encoding architecture to explicitly capture bidirectional topological structure and sequential dependencies in query trees; (2) it develops an uncertainty-aware learning-to-rank cost model integrating approximate Bayesian inference with listwise ranking loss; and (3) it designs the first subgraph-level contribution attribution mechanism for fine-grained, debuggable explanations. Experiments demonstrate that the model consistently outperforms state-of-the-art approaches in estimation accuracy, noise robustness, and explanation fidelity—significantly improving optimal execution plan selection rates and optimizer debuggability.

Technology Category

Application Category

📝 Abstract
In recent years, there has been a growing interest in using machine learning (ML) in query optimization to select more efficient plans. Existing learning-based query optimizers use certain model architectures to convert tree-structured query plans into representations suitable for downstream ML tasks. As the design of these architectures significantly impacts cost estimation, we propose a tree model architecture based on Bidirectional Graph Neural Networks (Bi-GNN) aggregated by Gated Recurrent Units (GRUs) to achieve more accurate cost estimates. The inherent uncertainty of data and model parameters also leads to inaccurate cost estimates, resulting in suboptimal plans and less robust query performance. To address this, we implement a novel learning-to-rank cost model that effectively quantifies the uncertainty in cost estimates using approximate probabilistic ML. This model adaptively integrates quantified uncertainty with estimated costs and learns from comparing pairwise plans, achieving more robust performance. In addition, we propose the first explainability technique specifically designed for learning-based cost models. This technique explains the contribution of any subgraphs in the query plan to the final predicted cost, which can be integrated and trained with any learning-based cost model to significantly boost the model's explainability. By incorporating these innovations, we propose a cost model for a Robust and Explainable Query Optimizer, Reqo, that improves the accuracy, robustness, and explainability of cost estimation, outperforming state-of-the-art approaches in all three dimensions.
Problem

Research questions and friction points this paper is trying to address.

Machine Learning
Database Query Optimization
Cost Estimation Uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Graph Neural Networks
Gated Recurrent Units
Explainable Cost Model