A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the challenge of automatic SQL query engine selection in lakehouse-integrated, multi-engine environments, this paper proposes a prior-knowledge-free cross-engine query routing method. Our approach centers on a unified multi-task learning cost model trained on optimized logical query plans, introducing a novel zero-shot/low-shot adaptation architecture capable of jointly predicting costs across multiple engines and instance configurations—eliminating the need for engine-specific modeling. By co-modeling query plan encoding, multi-task cost prediction, and hint-guided logical optimization, our method significantly improves prediction accuracy and generalization: Q-error decreases by 12.6%; total workload execution time reduces by 25.2% in zero-shot settings and by 30.4% in low-shot settings. The framework effectively mitigates selection complexity arising from newly introduced engines or workloads.

Technology Category

Application Category

📝 Abstract

Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine capability, a complex and manual task that only becomes more difficult with the emergence of new engines and workloads. In this paper, we address this limitation by proposing a cross-engine optimizer that can automate engine selection for diverse SQL queries through a learned cost model. Optimized with hints, a query plan is used for query cost prediction and routing. Cost prediction is formulated as a multi-task learning problem, and multiple predictor heads, corresponding to different engines and provisionings, are used in the model architecture. This eliminates the need to train engine-specific models and allows the flexible addition of new engines at a minimal fine-tuning cost. Results on various databases and engines show that using a query optimized logical plan for cost estimation decreases the average Q-error by even 12.6% over using unoptimized plans as input. Moreover, the proposed cross-engine optimizer reduces the total workload runtime by up to 25.2% in a zero-shot setting and 30.4% in a few-shot setting when compared to random routing.

Problem

Research questions and friction points this paper is trying to address.

Automating engine selection for diverse SQL queries

Reducing manual effort in query optimization across engines

Improving workload runtime via learned cost modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned cost model automates SQL engine selection

Multi-task learning predicts costs across engines

Optimized query plans reduce runtime significantly

🔎 Similar Papers

Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model