🤖 AI Summary
SQL query optimizers suffer from severe performance volatility—often spanning multiple orders of magnitude—due to sensitivity to join ordering; existing approaches lack theoretical robustness guarantees invariant to join order. This paper introduces Robust Predicate Transmission (RPT), the first framework that redefines predicate pushdown through the lens of robustness. It proposes LargestRoot and SafeSubjoin, two novel algorithms that deliver the first provably join-order-invariant performance guarantees for acyclic queries. RPT integrates predicate propagation, join graph structural analysis, and dynamic safety checking of sub-joins, and is fully integrated into DuckDB. Evaluated on TPC-H, JOB, and TPC-DS benchmarks, RPT reduces the worst-case-to-best-case execution time ratio across join orders to within 1.6× (with most cases ≤1.1×) and achieves a 1.5× end-to-end speedup in geometric mean query latency.
📝 Abstract
Join order optimization is critical in achieving good query performance. Despite decades of research and practice, modern query optimizers could still generate inferior join plans that are orders of magnitude slower than optimal. Existing research on robust query processing often lacks theoretical guarantees on join-order robustness while sacrificing query performance. In this paper, we rediscover the recent Predicate Transfer technique from a robustness point of view. We introduce two new algorithms, LargestRoot and SafeSubjoin, and then propose Robust Predicate Transfer (RPT) that is provably robust against arbitrary join orders of an acyclic query. We integrated Robust Predicate Transfer with DuckDB, a state-of-the-art analytical database, and evaluated against all the queries in TPC-H, JOB, and TPC-DS benchmarks. Our experimental results show that RPT improves join-order robustness by orders of magnitude compared to the baseline. With RPT, the largest ratio between the maximum and minimum execution time out of random join orders for a single acyclic query is only 1.6x (the ratio is close to 1 for most evaluated queries). Meanwhile, applying RPT also improves the end-to-end query performance by 1.5x (per-query geometric mean). We hope that this work sheds light on solving the practical join ordering problem.