Information Theory Strikes Back: New Development in the Theory of Cardinality Estimation

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of loose upper bounds on output cardinality estimation in database queries. We propose a novel information-theoretic and optimization-based method that formulates cardinality upper bound computation as a linear program, maximizing joint entropy subject to Shannon’s information inequalities and newly derived ℓₚ-norm (p > 1) degree-sequence information inequalities. For the first time, ℓₚ-norms are employed to characterize degree distributions over join attributes—overcoming the inherent limitations of traditional ℓ₁- and ℓ∞-norm bounds—and we prove theoretical tightness for single-join-attribute scenarios. The resulting upper bounds are computationally tractable and accompanied by rigorous theoretical guarantees; asymptotically, they strictly dominate existing approaches. We further design a query evaluation algorithm compatible with standard query execution, solvable in exponential time relative to query size.

Technology Category

Application Category

📝 Abstract
Estimating the cardinality of the output of a query is a fundamental problem in database query processing. In this article, we overview a recently published contribution that casts the cardinality estimation problem as linear optimization and computes guaranteed upper bounds on the cardinality of the output for any full conjunctive query. The objective of the linear program is to maximize the joint entropy of the query variables and its constraints are the Shannon information inequalities and new information inequalities involving $ell_p$-norms of the degree sequences of the join attributes. The bounds based on arbitrary norms can be asymptotically lower than those based on the $ell_1$ and $ell_infty$ norms, which capture the cardinalities and respectively the max-degrees of the input relations. They come with a matching query evaluation algorithm, are computable in exponential time in the query size, and are provably tight when each degree sequence is on one join attribute.
Problem

Research questions and friction points this paper is trying to address.

Cardinality estimation for database query outputs.
Linear optimization to compute guaranteed upper bounds.
Use of Shannon and new information inequalities.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear optimization for cardinality estimation
Maximizes joint entropy with Shannon inequalities
Uses $ ell_p$-norms for tighter asymptotic bounds
🔎 Similar Papers
No similar papers found.