🤖 AI Summary
This paper addresses the high hash-probe overhead in database join evaluation—particularly for complex cyclic queries—and the inability of existing linear-time algorithms to simultaneously ensure generality and efficiency. We propose TreeTracker Join (TTJ), the first pipelined join algorithm supporting arbitrary conjunctive queries (including cyclic ones) with theoretical linear-time complexity guarantees. Its core innovations are: (1) a backtracking hash re-binding mechanism that dynamically adjusts variable bindings upon hash failure and prunes invalid tuples; (2) a hypergraph-based “tree convolution” decomposition, eliminating redundancy inherent in traditional tree decompositions while preserving query semantics losslessly; and (3) a formal proof that TTJ’s number of hash probes is never worse than that of any binary hash join. Experiments on TPC-H, JOB, and SSB benchmarks show TTJ significantly outperforms state-of-the-art linear-time join algorithms, reducing hash probes by up to 47%, accelerating end-to-end execution, and lowering memory consumption.
📝 Abstract
We present a novel linear-time acyclic join algorithm, TreeTracker Join (TTJ). The algorithm can be understood as the pipelined binary hash join with a simple twist: upon a hash lookup failure, TTJ resets execution to the binding of the tuple causing the failure, and removes the offending tuple from its relation. Compared to the best known linear-time acyclic join algorithm, Yannakakis's algorithm, TTJ shares the same asymptotic complexity while imposing lower overhead. Further, we prove that when measuring query performance by counting the number of hash probes, TTJ will match or outperform binary hash join on the same plan. This property holds independently of the plan and independently of acyclicity. We are able to extend our theoretical results to cyclic queries by introducing a new hypergraph decomposition method called tree convolution. Tree convolution iteratively identifies and contracts acyclic subgraphs of the query hypergraph. The method avoids redundant calculations associated with tree decomposition and may be of independent interest. Empirical results on TPC-H, the Join Order Benchmark, and the Star Schema Benchmark demonstrate favorable results.