🤖 AI Summary
Query join-order optimization in databases is an NP-hard problem; conventional combinatorial search methods suffer from high computational complexity and poor scalability. This paper proposes the first end-to-end differentiable optimization framework: it continuously relaxes the discrete join-plan space into a soft adjacency matrix, incorporates Gumbel-Softmax reparameterization and differentiable constraint enforcement to enable gradient-driven join-order search, and employs a graph neural network to construct a learnable, differentiable cost model supporting backpropagation. Experiments on two graph datasets demonstrate that the generated execution plans achieve cost competitiveness—often superior—to those produced by classical algorithms (e.g., dynamic programming, greedy heuristics), while runtime scales linearly with query size—marking a significant departure from the quadratic or exponential time complexity inherent to traditional approaches.
📝 Abstract
Join ordering is the NP-hard problem of selecting the most efficient sequence in which to evaluate joins (conjunctive, binary operators) in a database query. As the performance of query execution critically depends on this choice, join ordering lies at the core of query optimization. Traditional approaches cast this problem as a discrete combinatorial search over binary trees guided by a cost model, but they often suffer from high computational complexity and limited scalability. We show that, when the cost model is differentiable, the query plans can be continuously relaxed into a soft adjacency matrix representing a superposition of plans. This continuous relaxation, together with a Gumbel-Softmax parameterization of the adjacency matrix and differentiable constraints enforcing plan validity, enables gradient-based search for plans within this relaxed space. Using a learned Graph Neural Network as the cost model, we demonstrate that this gradient-based approach can find comparable and even lower-cost plans compared to traditional discrete local search methods on two different graph datasets. Furthermore, we empirically show that the runtime of this approach scales linearly with query size, in contrast to quadratic or exponential runtimes of classical approaches. We believe this first step towards gradient-based join ordering can lead to more effective and efficient query optimizers in the future.