Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Inaccurate result-size estimation for join queries often leads to suboptimal query execution plans and poor performance. To address this, we propose “partition constraints”—a novel statistical structure that explicitly captures intrinsic data structure by partitioning relations into subrelations satisfying tighter domain-based constraints. We formally define partition constraints for the first time, and show they significantly tighten classical worst-case output-size bounds—most notably the AGM bound—while incurring only polynomial preprocessing overhead. Theoretically, our bounds improve the tightness of worst-case-optimal join algorithms (e.g., Generic Join), enhancing their practical competitiveness. Empirically, partition constraints enable progressive acceleration and superior runtime performance across diverse complex join queries, including cyclic, multi-way, and inequality-extended joins.

Technology Category

Application Category

📝 Abstract
In the last decade, various works have used statistics on relations to improve both the theory and practice of conjunctive query execution. Starting with the AGM bound which took advantage of relation sizes, later works incorporated statistics like functional dependencies and degree constraints. Each new statistic prompted work along two lines; bounding the size of conjunctive query outputs and worst-case optimal join algorithms. In this work, we continue in this vein by introducing a new statistic called a emph{partition constraint}. This statistic captures latent structure within relations by partitioning them into sub-relations which each have much tighter degree constraints. We show that this approach can both refine existing cardinality bounds and improve existing worst-case optimal join algorithms.
Problem

Research questions and friction points this paper is trying to address.

Partition Constraints
Query Optimization
Database Performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitioning Constraints
Query Optimization
Database Performance
🔎 Similar Papers
2024-04-15Annual Meeting of the Association for Computational LinguisticsCitations: 4