Join Cardinality Estimation with OmniSketches

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Inaccurate cardinality estimation for multi-predicate, multi-join queries often leads to suboptimal query execution plans. To address this, we propose OmniSketch-Join, a cross-table sketch interoperability framework for join cardinality estimation. Our method extends the OmniSketch data structure by abandoning the independence assumption, introducing a sketch fusion mechanism and a scalable join processing algorithm that supports α-acyclic join graphs. By integrating Count-Min Sketch with K-minwise hashing, it enables lightweight, cost-model-compatible embedded optimization. Experimental evaluation on SSB-skew shows up to 1077× reduction in intermediate result size and 3.19× speedup in end-to-end execution time. On JOB-light, several complex queries exhibit significant accuracy improvements. These results demonstrate OmniSketch-Join’s superior accuracy and practical effectiveness for intricate analytical workloads.

Technology Category

Application Category

📝 Abstract
Join ordering is a key factor in query performance, yet traditional cost-based optimizers often produce sub-optimal plans due to inaccurate cardinality estimates in multi-predicate, multi-join queries. Existing alternatives such as learning-based optimizers and adaptive query processing improve accuracy but can suffer from high training costs, poor generalization, or integration challenges. We present an extension of OmniSketch - a probabilistic data structure combining count-min sketches and K-minwise hashing - to enable multi-join cardinality estimation without assuming uniformity and independence. Our approach introduces the OmniSketch join estimator, ensures sketch interoperability across tables, and provides an algorithm to process alpha-acyclic join graphs. Our experiments on SSB-skew and JOB-light show that OmniSketch-enhanced cost-based optimization can improve estimation accuracy and plan quality compared to DuckDB. For SSB-skew, we show intermediate result decreases up to 1,077x and execution time decreases up to 3.19x. For JOB-light, OmniSketch join cardinality estimation shows occasional individual improvements but largely suffers from a loss of witnesses due to unfavorable join graph shapes and large numbers of unique values in foreign key columns.
Problem

Research questions and friction points this paper is trying to address.

Estimating multi-join query cardinality accurately
Overcoming uniformity and independence assumptions in optimization
Improving query plan quality through sketch-based estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniSketch probabilistic data structure extension
Ensures sketch interoperability across multiple tables
Algorithm processes alpha-acyclic join graphs
🔎 Similar Papers
No similar papers found.