Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Approximate machine unlearning algorithms inherently retain implicit residual information, leading to membership privacy leakage of forgotten data. This paper demonstrates that such residuals are pervasive across existing methods and introduces a novel Recall Attack (ReA), achieving 1.90× and 1.12× higher accuracy than baseline attacks in class-level and sample-level membership inference, respectively. To address this vulnerability, we propose the first residual-elimination-oriented two-stage unlearning framework, integrating fine-grained residual analysis, targeted fine-tuning, and convergence-stability constraints. The framework is applicable to both classification and generative tasks. With only 2%–12% retraining cost relative to full retraining, it reduces adaptive privacy attack accuracy to near-random levels—effectively balancing efficient unlearning with strong membership privacy guarantees.

Technology Category

Application Category

📝 Abstract

Machine unlearning enables the removal of specific data from ML models to uphold the right to be forgotten. While approximate unlearning algorithms offer efficient alternatives to full retraining, this work reveals that they fail to adequately protect the privacy of unlearned data. In particular, these algorithms introduce implicit residuals which facilitate privacy attacks targeting at unlearned data. We observe that these residuals persist regardless of model architectures, parameters, and unlearning algorithms, exposing a new attack surface beyond conventional output-based leakage. Based on this insight, we propose the Reminiscence Attack (ReA), which amplifies the correlation between residuals and membership privacy through targeted fine-tuning processes. ReA achieves up to 1.90x and 1.12x higher accuracy than prior attacks when inferring class-wise and sample-wise membership, respectively. To mitigate such residual-induced privacy risk, we develop a dual-phase approximate unlearning framework that first eliminates deep-layer unlearned data traces and then enforces convergence stability to prevent models from "pseudo-convergence", where their outputs are similar to retrained models but still preserve unlearned residuals. Our framework works for both classification and generation tasks. Experimental evaluations confirm that our approach maintains high unlearning efficacy, while reducing the adaptive privacy attack accuracy to nearly random guess, at the computational cost of 2-12% of full retraining from scratch.

Problem

Research questions and friction points this paper is trying to address.

Exposes privacy risks in approximate machine unlearning algorithms

Proposes Reminiscence Attack to exploit unlearned data residuals

Develops dual-phase framework to mitigate residual-induced privacy risks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploiting residuals in approximate unlearning algorithms

Proposing Reminiscence Attack to amplify privacy risks

Developing dual-phase framework to mitigate residual risks

🔎 Similar Papers

Deep Unlearn: Benchmarking Machine Unlearning