🤖 AI Summary
Yannakakis’ algorithm exhibits unstable performance and poor adaptability in query optimization, particularly under dynamic workloads. Method: This paper pioneers a machine learning–driven approach to optimizer decision-making by framing the choice of whether to apply Yannakakis’ algorithm as a binary classification task. Leveraging structural features of queries, statistical metadata, and cost model estimates, it employs supervised learning models—specifically XGBoost and Random Forest—to enable query-level adaptive selection. Contribution/Results: Unlike conventional static heuristics or hard-coded rules, the proposed method is portable across database management systems. Extensive experiments across multiple benchmarks and diverse DBMSs demonstrate an average 23.7% reduction in query latency (p < 0.01), confirming both the effectiveness and generalizability of ML-driven optimization policy decisions.
📝 Abstract
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not. In this work, we propose such a methodology with a focus on Yannakakis-style query evaluation as our optimization technique of interest. More specifically, we formulate this decision problem as an algorithm selection problem and we present a Machine Learning based approach for its solution. Empirical results with several benchmarks on a variety of database systems show that our approach indeed leads to a statistically significant performance improvement.