🤖 AI Summary
Single-table cardinality estimation (CE) faces a fundamental trade-off among accuracy, inference efficiency, and memory footprint. To address this, we propose a Copula-Neural hybrid model: the first to integrate Copula theory into CE, establishing a probabilistic framework grounded in the joint cumulative distribution function (Joint CDF) to explicitly capture multivariate dependencies over query intervals; further augmented by a lightweight neural network that corrects residual estimation errors. Our approach achieves high accuracy while substantially reducing inference latency, training cost, and model size. Evaluated on standard benchmarks, it outperforms all existing state-of-the-art methods across four critical dimensions—estimation accuracy, training time, inference latency, and model storage—delivering a more practical, deployable solution for modern query optimizers.
📝 Abstract
Cardinality estimation (CE), the task of predicting the result size of queries is a critical component of query optimization. Accurate estimates are essential for generating efficient query execution plans. Recently, machine learning techniques have been applied to CE, broadly categorized into query-driven and data-driven approaches. Data-driven methods learn the joint distribution of data, while query-driven methods construct regression models that map query features to cardinalities. Ideally, a CE technique should strike a balance among three key factors: accuracy, efficiency, and memory footprint. However, existing state-of-the-art models often fail to achieve this balance.
To address this, we propose CoLSE, a hybrid learned approach for single-table cardinality estimation. CoLSE directly models the joint probability over queried intervals using a novel algorithm based on copula theory and integrates a lightweight neural network to correct residual estimation errors. Experimental results show that CoLSE achieves a favorable trade-off among accuracy, training time, inference latency, and model size, outperforming existing state-of-the-art methods.