Join Cardinality Estimation with OmniSketches

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Inaccurate cardinality estimation for multi-predicate, multi-join queries often leads to suboptimal query execution plans. To address this, we propose OmniSketch-Join, a cross-table sketch interoperability framework for join cardinality estimation. Our method extends the OmniSketch data structure by abandoning the independence assumption, introducing a sketch fusion mechanism and a scalable join processing algorithm that supports α-acyclic join graphs. By integrating Count-Min Sketch with K-minwise hashing, it enables lightweight, cost-model-compatible embedded optimization. Experimental evaluation on SSB-skew shows up to 1077× reduction in intermediate result size and 3.19× speedup in end-to-end execution time. On JOB-light, several complex queries exhibit significant accuracy improvements. These results demonstrate OmniSketch-Join’s superior accuracy and practical effectiveness for intricate analytical workloads.

Technology Category

Application Category

📝 Abstract

Join ordering is a key factor in query performance, yet traditional cost-based optimizers often produce sub-optimal plans due to inaccurate cardinality estimates in multi-predicate, multi-join queries. Existing alternatives such as learning-based optimizers and adaptive query processing improve accuracy but can suffer from high training costs, poor generalization, or integration challenges. We present an extension of OmniSketch - a probabilistic data structure combining count-min sketches and K-minwise hashing - to enable multi-join cardinality estimation without assuming uniformity and independence. Our approach introduces the OmniSketch join estimator, ensures sketch interoperability across tables, and provides an algorithm to process alpha-acyclic join graphs. Our experiments on SSB-skew and JOB-light show that OmniSketch-enhanced cost-based optimization can improve estimation accuracy and plan quality compared to DuckDB. For SSB-skew, we show intermediate result decreases up to 1,077x and execution time decreases up to 3.19x. For JOB-light, OmniSketch join cardinality estimation shows occasional individual improvements but largely suffers from a loss of witnesses due to unfavorable join graph shapes and large numbers of unique values in foreign key columns.

Problem

Research questions and friction points this paper is trying to address.

Estimating multi-join query cardinality accurately

Overcoming uniformity and independence assumptions in optimization

Improving query plan quality through sketch-based estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniSketch probabilistic data structure extension

Ensures sketch interoperability across multiple tables

Algorithm processes alpha-acyclic join graphs

🔎 Similar Papers

Color: A Framework for Applying Graph Coloring to Subgraph Cardinality Estimation

2024-05-10Proceedings of the VLDB EndowmentCitations: 2

ByteDance

圣何塞

Research Engineer / Scientist -AI for Databases

ByteDance

西雅图

Research Scientist, AI & Systems Co-design (PhD)