CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-table cardinality estimation (CE) faces a fundamental trade-off among accuracy, inference efficiency, and memory footprint. To address this, we propose a Copula-Neural hybrid model: the first to integrate Copula theory into CE, establishing a probabilistic framework grounded in the joint cumulative distribution function (Joint CDF) to explicitly capture multivariate dependencies over query intervals; further augmented by a lightweight neural network that corrects residual estimation errors. Our approach achieves high accuracy while substantially reducing inference latency, training cost, and model size. Evaluated on standard benchmarks, it outperforms all existing state-of-the-art methods across four critical dimensions—estimation accuracy, training time, inference latency, and model storage—delivering a more practical, deployable solution for modern query optimizers.

Technology Category

Application Category

📝 Abstract
Cardinality estimation (CE), the task of predicting the result size of queries is a critical component of query optimization. Accurate estimates are essential for generating efficient query execution plans. Recently, machine learning techniques have been applied to CE, broadly categorized into query-driven and data-driven approaches. Data-driven methods learn the joint distribution of data, while query-driven methods construct regression models that map query features to cardinalities. Ideally, a CE technique should strike a balance among three key factors: accuracy, efficiency, and memory footprint. However, existing state-of-the-art models often fail to achieve this balance. To address this, we propose CoLSE, a hybrid learned approach for single-table cardinality estimation. CoLSE directly models the joint probability over queried intervals using a novel algorithm based on copula theory and integrates a lightweight neural network to correct residual estimation errors. Experimental results show that CoLSE achieves a favorable trade-off among accuracy, training time, inference latency, and model size, outperforming existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Develops a hybrid model for cardinality estimation balancing accuracy and efficiency
Addresses limitations of existing methods in single-table query size prediction
Integrates copula theory with neural networks to correct estimation errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid model using copula theory for joint probability estimation
Lightweight neural network corrects residual estimation errors
Balances accuracy, efficiency, and memory footprint in cardinality estimation
🔎 Similar Papers
No similar papers found.