Causal Fuzzing for Verifying Machine Unlearning

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing machine unlearning verification methods inadequately assess both direct and indirect residual effects after “deletion” of target data points or features—particularly under black-box settings and in the presence of indirect influence pathways. This paper proposes CAFÉ, the first causal-aware framework unifying verification at both data-point and feature levels. CAFÉ models variable dependencies via causal fuzz testing, and integrates counterfactual analysis with sensitivity detection to pinpoint fine-grained bias propagation arising from incomplete unlearning. Experiments across five datasets and three model families demonstrate that CAFÉ significantly improves detection of residual effects—identifying over 37% more indirect effects missed by baseline methods—while maintaining computational efficiency and scalability. This work establishes the first evaluation paradigm for machine unlearning that jointly ensures causal interpretability, hierarchical consistency (across data and feature levels), and practical deployability.

Technology Category

Application Category

📝 Abstract

As machine learning models become increasingly embedded in decision-making systems, the ability to "unlearn" targeted data or features is crucial for enhancing model adaptability, fairness, and privacy in models which involves expensive training. To effectively guide machine unlearning, a thorough testing is essential. Existing methods for verification of machine unlearning provide limited insights, often failing in scenarios where the influence is indirect. In this work, we propose CAFÉ, a new causality based framework that unifies datapoint- and feature-level unlearning for verification of black-box ML models. CAFÉ evaluates both direct and indirect effects of unlearning targets through causal dependencies, providing actionable insights with fine-grained analysis. Our evaluation across five datasets and three model architectures demonstrates that CAFÉ successfully detects residual influence missed by baselines while maintaining computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

Verifying machine unlearning in black-box ML models

Assessing both direct and indirect causal effects

Detecting residual influence missed by existing methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causality-based framework for black-box ML verification

Evaluates direct and indirect unlearning effects

Provides fine-grained analysis with computational efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow