CUBE: A Cardinality Estimator Based on Neural CDF

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Cardinality estimation remains a critical bottleneck in database query optimization: while data-driven approaches improve accuracy, they suffer from high inference latency, poor scalability with dimensionality, and estimation instability due to sampling or numerical integration—undermining predictability of query performance. This paper introduces Neural CDF, a novel modeling paradigm that directly learns a multidimensional cumulative distribution function (CDF) via end-to-end supervised learning. Range-query cardinalities are then computed in closed form using analytical interval integration over the learned CDF, eliminating sampling and numerical integration entirely. Neural CDF achieves near-constant inference latency—unaffected by dimensionality growth—for the first time. It accelerates inference by over 10× compared to state-of-the-art methods in high-dimensional settings, while simultaneously improving estimation accuracy and stability. The approach enables real-time, scalable query optimization with deterministic performance guarantees.

Technology Category

Application Category

📝 Abstract

Modern database optimizer relies on cardinality estimator, whose accuracy directly affects the optimizer's ability to choose an optimal execution plan. Recent work on data-driven methods has leveraged probabilistic models to achieve higher estimation accuracy, but these approaches cannot guarantee low inference latency at the same time and neglect scalability. As data dimensionality grows, optimization time can even exceed actual query execution time. Furthermore, inference with probabilistic models by sampling or integration procedures unpredictable estimation result and violate stability, which brings unstable performance with query execution and make database tuning hard for database users. In this paper, we propose a novel approach to cardinality estimation based on cumulative distribution function(CDF), which calculates range query cardinality without sampling or integration, ensuring accurate and predictable estimation results. With inference acceleration by merging calculations, we can achieve fast and nearly constant inference speed while maintaining high accuracy, even as dimensionality increases, which is over 10x faster than current state-of-the-art data-driven cardinality estimator. This demonstrates its excellent dimensional scalability, making it well-suited for real-world database applications.

Problem

Research questions and friction points this paper is trying to address.

Ensures accurate, predictable cardinality estimation without sampling or integration

Achieves fast, nearly constant inference speed as dimensionality increases

Provides excellent dimensional scalability for real-world database applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural CDF for cardinality estimation without sampling

Merging calculations for fast constant inference speed

Ensures high accuracy and scalability in databases

🔎 Similar Papers

No similar papers found.