A Unified Model for Cardinality Estimation by Learning from Data and Queries via Sum-Product Networks

πŸ“… 2025-05-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Cardinality estimation in databases must balance accuracy, inference latency, and storage overheadβ€”yet existing approaches fail to jointly optimize all three. This paper introduces Query-aware Sum-Product Networks (QSPNs), the first method to explicitly model query access patterns via novel QProduct and QSplit nodes, enabling joint optimization of query-aware column partitioning and tree-structure expansion. Integrating query workload analysis, offline learning, and online fast inference, QSPN achieves state-of-the-art performance across both single-table and multi-table settings: it attains the lowest estimation error, sub-millisecond inference latency, and only MB-scale model storage. Its core contribution is the first end-to-end, lightweight, and query-aware cardinality estimation framework that simultaneously delivers high accuracy, ultra-low latency, and minimal memory footprint.

Technology Category

Application Category

πŸ“ Abstract
Cardinality estimation is a fundamental component in database systems, crucial for generating efficient execution plans. Despite advancements in learning-based cardinality estimation, existing methods may struggle to simultaneously optimize the key criteria: estimation accuracy, inference time, and storage overhead, limiting their practical applicability in real-world database environments. This paper introduces QSPN, a unified model that integrates both data distribution and query workload. QSPN achieves high estimation accuracy by modeling data distribution using the simple yet effective Sum-Product Network (SPN) structure. To ensure low inference time and reduce storage overhead, QSPN further partitions columns based on query access patterns. We formalize QSPN as a tree-based structure that extends SPNs by introducing two new node types: QProduct and QSplit. This paper studies the research challenges of developing efficient algorithms for the offline construction and online computation of QSPN. We conduct extensive experiments to evaluate QSPN in both single-table and multi-table cardinality estimation settings. The experimental results have demonstrated that QSPN achieves superior and robust performance on the three key criteria, compared with state-of-the-art approaches.
Problem

Research questions and friction points this paper is trying to address.

Improving cardinality estimation accuracy in database systems
Reducing inference time and storage overhead simultaneously
Integrating data distribution and query workload for optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Sum-Product Networks for data distribution modeling
Partitions columns based on query access patterns
Introduces QProduct and QSplit node types
πŸ”Ž Similar Papers
No similar papers found.
J
Jiawei Liu
Renmin University of China
Ju Fan
Ju Fan
Renmin University of China
DatabaseCrowdsourcingInfluence MaximizationData Integration
T
Tongyu Liu
Renmin University of China
K
Kai Zeng
Huawei Technologies
J
Jiannan Wang
Huawei Technologies & Simon Fraser University
Q
Quehuan Liu
Renmin University of China
Tao Ye
Tao Ye
NWPU (China)
MicrosystemsNanofabricationPhotovoltaic
Nan Tang
Nan Tang
National Institute of Biological Sciences, Beijing
stem cell biologyaginglung diseases