Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores

📅 2024-11-06
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive hidden constants and intermediate result explosion in acyclic join queries on column-store databases—caused by naive implementations of Yannakakis’ algorithm—this paper proposes Shredded Yannakakis (SYA). SYA is grounded in a novel formalization of two-phase Nested Semijoin Algebra (2-phase NSA), which strictly enforces semijoin-based contraction *before* expansion. It introduces the *shredding* execution paradigm, decoupling Lookup and Expand operators and enabling automatic translation of binary join plans into 2-phase NSA. Theoretically, SYA is proven instance-optimal and regret-free. Evaluated on 1,849 real-world queries, SYA achieves performance improvements for 85.3% of them, with speedups up to 62.5×; remaining queries exhibit competitive performance.

Technology Category

Application Category

📝 Abstract
Acyclic join queries can be evaluated instance-optimally using Yannakakis' algorithm, which avoids needlessly large intermediate results through semi-join passes. Recent work proposes to address the significant hidden constant factors arising from a naive implementation of Yannakakis by decomposing the hash join operator into two suboperators, called Lookup and Expand. In this paper, we present a novel method for integrating Lookup and Expand plans in interpreted environments, like column stores, formalizing them using Nested Semijoin Algebra (NSA) and implementing them through a shredding approach. We characterize the class of NSA expressions that can be evaluated instance-optimally as those that are 2-phase: no `shrinking' operator is applied after an unnest (i.e., expand). We introduce Shredded Yannakakis (SYA), an evaluation algorithm for acyclic joins that, starting from a binary join plan, transforms it into a 2-phase NSA plan, and then evaluates it through the shredding technique. We show that SYA is provably robust (i.e., never produces large intermediate results) and without regret (i.e., is never worse than the binary join plan under a suitable cost model) on the class of well-behaved binary join plans. Our experiments on a suite of 1,849 queries show that SYA improves performance for 85.3% of the queries with speedups up to 62.5x, while remaining competitive on the other queries. We hope this approach offers a fresh perspective on Yannakakis' algorithm, helping system engineers better understand its practical benefits and facilitating its adoption into a broader spectrum of query engines.
Problem

Research questions and friction points this paper is trying to address.

Optimizing acyclic join queries using Yannakakis algorithm efficiently
Reducing hidden constant factors in Yannakakis via Lookup and Expand
Ensuring robust and regret-free join processing in column stores
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes hash join into Lookup and Expand suboperators
Uses Nested Semijoin Algebra for formalization
Implements Shredded Yannakakis algorithm for robustness
🔎 Similar Papers