🤖 AI Summary
Existing machine unlearning verification methods inadequately assess both direct and indirect residual effects after “deletion” of target data points or features—particularly under black-box settings and in the presence of indirect influence pathways. This paper proposes CAFÉ, the first causal-aware framework unifying verification at both data-point and feature levels. CAFÉ models variable dependencies via causal fuzz testing, and integrates counterfactual analysis with sensitivity detection to pinpoint fine-grained bias propagation arising from incomplete unlearning. Experiments across five datasets and three model families demonstrate that CAFÉ significantly improves detection of residual effects—identifying over 37% more indirect effects missed by baseline methods—while maintaining computational efficiency and scalability. This work establishes the first evaluation paradigm for machine unlearning that jointly ensures causal interpretability, hierarchical consistency (across data and feature levels), and practical deployability.
📝 Abstract
As machine learning models become increasingly embedded in decision-making systems, the ability to "unlearn" targeted data or features is crucial for enhancing model adaptability, fairness, and privacy in models which involves expensive training. To effectively guide machine unlearning, a thorough testing is essential. Existing methods for verification of machine unlearning provide limited insights, often failing in scenarios where the influence is indirect. In this work, we propose CAFÉ, a new causality based framework that unifies datapoint- and feature-level unlearning for verification of black-box ML models. CAFÉ evaluates both direct and indirect effects of unlearning targets through causal dependencies, providing actionable insights with fine-grained analysis. Our evaluation across five datasets and three model architectures demonstrates that CAFÉ successfully detects residual influence missed by baselines while maintaining computational efficiency.