Data-Agnostic Cardinality Learning from Imperfect Workloads

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the cardinality estimation (CardEst) challenge under data-unseen scenarios—where join templates are incomplete and imbalanced—this paper proposes GRASP, the first query-driven cardinality learning system that requires neither access to raw data nor precomputed summaries. Its core contributions are: (1) a compositional architecture generalizable to unseen join templates; (2) a distributionally robust single-table cardinality estimator that mitigates bias from skewed training workloads; and (3) the first learned counting sketch explicitly modeling join correlations among base tables. Evaluated on three real-world databases and the CEB-IMDb-full benchmark, GRASP achieves superior accuracy and lower query latency compared to conventional data-driven methods—even when trained on only 10% of join templates—demonstrating state-of-the-art performance without requiring full template coverage or ground-truth data access.

Technology Category

Application Category

📝 Abstract
Cardinality estimation (CardEst) is a critical aspect of query optimization. Traditionally, it leverages statistics built directly over the data. However, organizational policies (e.g., regulatory compliance) may restrict global data access. Fortunately, query-driven cardinality estimation can learn CardEst models using query workloads. However, existing query-driven models often require access to data or summaries for best performance, and they assume perfect training workloads with complete and balanced join templates (or join graphs). Such assumptions rarely hold in real-world scenarios, in which join templates are incomplete and imbalanced. We present GRASP, a data-agnostic cardinality learning system designed to work under these real-world constraints. GRASP's compositional design generalizes to unseen join templates and is robust to join template imbalance. It also introduces a new per-table CardEst model that handles value distribution shifts for range predicates, and a novel learned count sketch model that captures join correlations across base relations. Across three database instances, we demonstrate that GRASP consistently outperforms existing query-driven models on imperfect workloads, both in terms of estimation accuracy and query latency. Remarkably, GRASP achieves performance comparable to, or even surpassing, traditional approaches built over the underlying data on the complex CEB-IMDb-full benchmark -- despite operating without any data access and using only 10% of all possible join templates.
Problem

Research questions and friction points this paper is trying to address.

Data-agnostic cardinality estimation without global data access
Handling incomplete and imbalanced join templates in workloads
Robust performance under real-world constraints and distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-agnostic cardinality learning system GRASP
Per-table CardEst model for value shifts
Learned count sketch captures join correlations
🔎 Similar Papers
No similar papers found.