Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models

📅 2025-09-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Machine unlearning in text-to-image models faces significant challenges: existing methods fail to fully erase harmful concepts, as adversarial prompts can still trigger their regeneration. This paper introduces the novel task of “memory self-regeneration” to uncover the dual mechanisms underlying post-unlearning knowledge recall. We propose MemoRa, the first recovery strategy that formalizes short-term and long-term forgetting patterns and establishes “knowledge retrieval robustness” as a new evaluation metric for unlearning efficacy. By integrating adversarial prompt attacks, knowledge activation probing, and memory trajectory analysis, our framework systematically identifies and reconstructs latent, ostensibly forgotten concepts. Experiments demonstrate widespread fragility across current unlearning methods—erased concepts are consistently and efficiently recoverable. This work advances theoretical understanding of generative model memory mechanisms and provides a principled technical foundation for developing safer, more controllable AI systems.

Technology Category

Application Category

📝 Abstract

The impressive capability of modern text-to-image models to generate realistic visuals has come with a serious drawback: they can be misused to create harmful, deceptive or unlawful content. This has accelerated the push for machine unlearning. This new field seeks to selectively remove specific knowledge from a model's training data without causing a drop in its overall performance. However, it turns out that actually forgetting a given concept is an extremely difficult task. Models exposed to attacks using adversarial prompts show the ability to generate so-called unlearned concepts, which can be not only harmful but also illegal. In this paper, we present considerations regarding the ability of models to forget and recall knowledge, introducing the Memory Self-Regeneration task. Furthermore, we present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge. Moreover, we propose that robustness in knowledge retrieval is a crucial yet underexplored evaluation measure for developing more robust and effective unlearning techniques. Finally, we demonstrate that forgetting occurs in two distinct ways: short-term, where concepts can be quickly recalled, and long-term, where recovery is more challenging.

Problem

Research questions and friction points this paper is trying to address.

Models retain harmful knowledge after unlearning attempts

Unlearned concepts can be illegally regenerated through adversarial prompts

Current unlearning lacks robust evaluation for knowledge retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory Self-Regeneration task for knowledge recall

MemoRa strategy enables regenerative knowledge recovery

Distinguishes short-term versus long-term forgetting mechanisms

🔎 Similar Papers

Predicting and analyzing memorization within fine-tuned Large Language Models