Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Existing machine learning unlearning methods suffer from two critical blind spots: “over-unlearning” (OU) in the neighborhood of the retain set and prototype-driven posterior relearning attacks. This work formally defines OU@ε—a novel metric quantifying over-unlearning—and reveals that class prototypes can be exploited to efficiently reconstruct forgotten knowledge, posing a previously unrecognized threat. To address these issues, we propose Spotter: a lightweight, plug-and-play unlearning framework that (i) mitigates over-unlearning via neighborhood-aware knowledge distillation with learnable masks, and (ii) thwarts prototype attacks by intra-class embedding discretization. Spotter requires no access to retain-set data and avoids model retraining. On CIFAR-10, it reduces OU@ε to less than 5% of baseline values, achieves near-zero forgetting accuracy (i.e., successful unlearning), and incurs <1% accuracy loss on the retain set—significantly enhancing both robustness and practicality of machine unlearning.

Technology Category

Application Category

📝 Abstract

Machine unlearning (MU) aims to expunge a designated forget set from a trained model without costly retraining, yet the existing techniques overlook two critical blind spots:"over-unlearning"that deteriorates retained data near the forget set, and post-hoc"relearning"attacks that aim to resurrect the forgotten knowledge. We first derive the over-unlearning metric OU@{epsilon}, which represents the collateral damage to the nearby region of the forget set, where the over-unlearning mainly appears. Next, we expose an unforeseen relearning threat on MU, i.e., the Prototypical Relearning Attack, which exploits the per-class prototype of the forget class with just a few samples, and easily restores the pre-unlearning performance. To counter both blind spots, we introduce Spotter, a plug-and-play objective that combines (i) a masked knowledge-distillation penalty on the nearby region of forget set to suppress OU@{epsilon}, and (ii) an intra-class dispersion loss that scatters forget-class embeddings, neutralizing prototypical relearning attacks. On CIFAR-10, as one of validations, Spotter reduces OU@{epsilon}by below the 0.05X of the baseline, drives forget accuracy to 0%, preserves accuracy of the retain set within 1% of difference with the original, and denies the prototype-attack by keeping the forget set accuracy within<1%, without accessing retained data. It confirms that Spotter is a practical remedy of the unlearning's blind spots.

Problem

Research questions and friction points this paper is trying to address.

Over-unlearning damages retained data near forget set

Prototypical Relearning Attack resurrects forgotten knowledge easily

Spotter mitigates over-unlearning and prevents relearning attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces OU@ε metric for over-unlearning damage

Proposes Prototypical Relearning Attack using class prototypes

Develops Spotter with masked distillation and dispersion loss

🔎 Similar Papers

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning