🤖 AI Summary
Inaccurate cardinality estimation for multi-predicate, multi-join queries often leads to suboptimal query execution plans. To address this, we propose OmniSketch-Join, a cross-table sketch interoperability framework for join cardinality estimation. Our method extends the OmniSketch data structure by abandoning the independence assumption, introducing a sketch fusion mechanism and a scalable join processing algorithm that supports α-acyclic join graphs. By integrating Count-Min Sketch with K-minwise hashing, it enables lightweight, cost-model-compatible embedded optimization. Experimental evaluation on SSB-skew shows up to 1077× reduction in intermediate result size and 3.19× speedup in end-to-end execution time. On JOB-light, several complex queries exhibit significant accuracy improvements. These results demonstrate OmniSketch-Join’s superior accuracy and practical effectiveness for intricate analytical workloads.
📝 Abstract
Join ordering is a key factor in query performance, yet traditional cost-based optimizers often produce sub-optimal plans due to inaccurate cardinality estimates in multi-predicate, multi-join queries. Existing alternatives such as learning-based optimizers and adaptive query processing improve accuracy but can suffer from high training costs, poor generalization, or integration challenges. We present an extension of OmniSketch - a probabilistic data structure combining count-min sketches and K-minwise hashing - to enable multi-join cardinality estimation without assuming uniformity and independence. Our approach introduces the OmniSketch join estimator, ensures sketch interoperability across tables, and provides an algorithm to process alpha-acyclic join graphs. Our experiments on SSB-skew and JOB-light show that OmniSketch-enhanced cost-based optimization can improve estimation accuracy and plan quality compared to DuckDB. For SSB-skew, we show intermediate result decreases up to 1,077x and execution time decreases up to 3.19x. For JOB-light, OmniSketch join cardinality estimation shows occasional individual improvements but largely suffers from a loss of witnesses due to unfavorable join graph shapes and large numbers of unique values in foreign key columns.