🤖 AI Summary
Traditional online query optimizers, constrained by low-overhead requirements, struggle to explore high-quality plan spaces for frequently repeated analytical queries. Method: This paper proposes an execution-feedback-driven optimization framework tailored for offline scenarios—introducing actual query execution as a core optimization primitive to establish a closed-loop search mechanism; it designs a variational autoencoder (VAE)-based embedding representation for query plans and integrates Bayesian optimization to enable high-accuracy, low-sample-complexity global plan search. Contribution/Results: Evaluated on multiple benchmarks, the approach significantly outperforms both PostgreSQL’s default optimizer and state-of-the-art reinforcement learning–based methods, achieving over 35% average end-to-end query latency reduction. The acceleration is especially pronounced in scenarios involving thousands of repeated queries, demonstrating superior scalability and robustness for workload-aware optimization.
📝 Abstract
Analytics database workloads often contain queries that are executed repeatedly. Existing optimization techniques generally prioritize keeping optimization cost low, normally well below the time it takes to execute a single instance of a query. If a given query is going to be executed thousands of times, could it be worth investing significantly more optimization time? In contrast to traditional online query optimizers, we propose an offline query optimizer that searches a wide variety of plans and incorporates query execution as a primitive. Our offline query optimizer combines variational auto-encoders with Bayesian optimization to find optimized plans for a given query. We compare our technique to the optimal plans possible with PostgreSQL and recent RL-based systems over several datasets, and show that our technique finds faster query plans.