Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes a novel security threat to prediction uncertainty in machine unlearning scenarios. Existing unlearning methods neglect adversarial manipulation of uncertainty estimates, leaving critical security vulnerabilities. To address this, we propose the first malicious unlearning attack paradigm targeting prediction uncertainty, designing a black-box optimization framework that combines gradient estimation and targeted perturbation to precisely manipulate uncertainty metrics—including confidence scores and entropy—without accessing model parameters. Experiments demonstrate that our attack achieves significantly higher success rates in uncertainty manipulation than conventional label-misclassification attacks and effectively evades state-of-the-art unlearning defenses. This study uncovers a critical security blind spot in current unlearning methodologies and provides both a foundational warning and technical groundwork for developing robust, uncertainty-aware unlearning systems.

Technology Category

Application Category

📝 Abstract
Currently, various uncertainty quantification methods have been proposed to provide certainty and probability estimates for deep learning models' label predictions. Meanwhile, with the growing demand for the right to be forgotten, machine unlearning has been extensively studied as a means to remove the impact of requested sensitive data from a pre-trained model without retraining the model from scratch. However, the vulnerabilities of such generated predictive uncertainties with regard to dedicated malicious unlearning attacks remain unexplored. To bridge this gap, for the first time, we propose a new class of malicious unlearning attacks against predictive uncertainties, where the adversary aims to cause the desired manipulations of specific predictive uncertainty results. We also design novel optimization frameworks for our attacks and conduct extensive experiments, including black-box scenarios. Notably, our extensive experiments show that our attacks are more effective in manipulating predictive uncertainties than traditional attacks that focus on label misclassifications, and existing defenses against conventional attacks are ineffective against our attacks.
Problem

Research questions and friction points this paper is trying to address.

Explores vulnerabilities in predictive uncertainty during unlearning attacks
Proposes malicious attacks targeting uncertainty manipulation in models
Tests attack effectiveness against traditional defenses in black-box scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Malicious unlearning attacks target predictive uncertainties
Novel optimization frameworks for uncertainty manipulation
Effective in black-box scenarios, bypassing defenses
🔎 Similar Papers
No similar papers found.