🤖 AI Summary
Machine unlearning in multi-table learned cardinality estimation (CE) systems faces three key challenges upon data deletion: attribute-level sensitivity, cross-table error propagation, and join overestimation due to domain value disappearance.
Method: This work introduces pruning—previously unexplored in machine unlearning—for efficient parameter updates without full retraining. It proposes two novel pruning strategies: distribution-sensitivity pruning and domain-value pruning. The approach integrates distribution-sensitivity analysis, semi-join-based deletion-result construction, parameter sensitivity scoring, and complete domain-value removal, and is compatible with NeuroCard and FACE architectures.
Results: Evaluated on IMDB and TPC-H, the method achieves the lowest query error (Q-error), significantly reduces convergence iterations, and incurs only 0.3%–2.5% of the computational overhead of fine-tuning. It outperforms full retraining in both accuracy and efficiency.
📝 Abstract
Machine unlearning in learned cardinality estimation (CE) systems presents unique challenges due to the complex distributional dependencies in multi-table relational data. Specifically, data deletion, a core component of machine unlearning, faces three critical challenges in learned CE models: attribute-level sensitivity, inter-table propagation and domain disappearance leading to severe overestimation in multi-way joins. We propose Cardinality Estimation Pruning (CEP), the first unlearning framework specifically designed for multi-table learned CE systems. CEP introduces Distribution Sensitivity Pruning, which constructs semi-join deletion results and computes sensitivity scores to guide parameter pruning, and Domain Pruning, which removes support for value domains entirely eliminated by deletion. We evaluate CEP on state-of-the-art architectures NeuroCard and FACE across IMDB and TPC-H datasets. Results demonstrate CEP consistently achieves the lowest Q-error in multi-table scenarios, particularly under high deletion ratios, often outperforming full retraining. Furthermore, CEP significantly reduces convergence iterations, incurring negligible computational overhead of 0.3%-2.5% of fine-tuning time.