SoK: Unlearnability and Unlearning for Model Dememorization

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the privacy risks in machine learning arising from models over-memorizing sensitive data, a problem inadequately mitigated by existing unlearning methods that often suffer from superficial forgetting, weak robustness, and a lack of theoretical guarantees. To bridge this gap, the paper systematically integrates the paradigms of unlearnability and machine unlearning, proposing the first unified categorization framework for de-memorization techniques. Through a structured state-of-the-art knowledge synthesis (SoK), empirical evaluation, and theoretical analysis, the study reveals critical interactions and vulnerabilities between input perturbations and unlearning algorithms, demonstrates that prevailing approaches typically achieve only shallow de-memorization, and—most notably—establishes the first formal theoretical guarantee on the depth of model unlearning under certified unlearning settings, thereby laying foundational support for end-to-end privacy preservation across the model lifecycle.

📝 Abstract

Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible perturbations into data before release to reduce learnability. At the post-training stage, unlearning removes previously acquired information from models to prevent unauthorized disclosure or use. While both defenses aim to preserve the right to withhold knowledge, their vulnerabilities and shared foundations remain unclear. Specifically, both unlearnability and unlearning suffer from issues such as shallow dememorization, leading to falsely claimed data learnability reduction or forgetting in the presence of weight perturbations. Moreover, input perturbations may affect the effectiveness of downstream unlearning, while unlearning may inadvertently recover domain knowledge hidden by unlearnability. This interplay calls for deeper investigation. Finally, there is a lack of formal guarantees to provide theoretical insights into current defenses against shallow dememorization. In this Systematization of Knowledge, we present the first integrated analysis of model dememorization approaches leveraging unlearnability and unlearning. Our contributions are threefold: (i) a unified taxonomy of unlearnability and scalable unlearning methods; (ii) an empirical evaluation revealing the robustness, interplay, and shallow dememorization of leading methods; and (iii) the first theoretical guarantee on dememorization depth for models processed through certified unlearning. These results lay the foundation for unifying dememorization mechanisms across the ML lifecycle to achieve a deeper immemor state for sensitive knowledge.

Problem

Research questions and friction points this paper is trying to address.

unlearnability

machine unlearning

model dememorization

shallow dememorization

data misuse

Innovation

Methods, ideas, or system contributions that make the work stand out.

unlearnability

machine unlearning

dememorization