Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing machine unlearning methods often rely on label manipulation or task gradient inversion, which struggle to approximate the performance of full retraining and may degrade the model’s original utility. This work proposes a novel unlearning paradigm grounded in manifold representations: by employing a triplet margin loss in the representation space, it pushes the to-be-forgotten samples away from their original manifold centers while simultaneously pulling them toward semantic neighbors of the retained data, thereby aligning with the behavior of full retraining. Notably, the approach operates without requiring labels or task-specific gradients and introduces a self-mode connectivity module that dynamically generates adaptive margins to rapidly reconstruct local manifolds for effective unlearning. Experiments across four benchmark datasets demonstrate that, solely through manipulation of the model’s representation space, the proposed method achieves unlearning efficacy comparable to state-of-the-art approximate approaches.

📝 Abstract

Machine unlearning is a fundamental mechanism that enforces the right to be forgotten. Existing unlearning studies that rely on label manipulation or task-gradient reversal often deliver limited unlearning effectiveness. Moreover, they can undermine the original learning objective and typically do not guarantee equivalence to standard unlearning by retraining. In this paper, we propose \textbf{ManiF-SMC} (\textbf{Mani}fold \textbf{F}orgetting with \textbf{S}elf \textbf{M}ode \textbf{C}onnectivity), motivated by the observation that a model retrained on the remaining data tends to classify erased samples by their semantic similarity to the retained data. We begin with systematically recasting the approximate unlearning as pushing each erased sample away from its original learned manifold representation centroid toward its nearest semantic neighbors in the retained data. This reformulation aligns unlearning with retraining behavior and operates purely in representation space, reducing reliance on labels and task-specific gradients. To tackle the manifold representation-based unlearning problem, ManiF-SMC encapsulates the unlearning and representation preservation goals in a margin-based triplet loss. Because finding a suitable margin for unlearning is challenging, we propose a self-mode-connectivity module that rapidly reconstructs the local manifold to guide the adaptive margins generation for each unlearning case. Extensive experiments on four representative datasets show that ManiF-SMC achieves unlearning effectiveness comparable to state-of-the-art approximate methods while operating solely within the model's representation space.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

right to be forgotten

approximate unlearning

representation space

retraining equivalence

Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning

manifold representation

self mode connectivity