Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Machine unlearning (MU) techniques for image generation models (IGMs) suffer from insufficient robustness against multimodal adversarial attacks, particularly failing to prevent cross-modal conceptual recovery. Method: We propose the first image-guided multimodal adversarial attack framework: leveraging a single semantically relevant reference image, it exploits diffusion models’ multimodal alignment properties and gradient-based optimization to efficiently reconstruct forgotten concepts while preserving textual semantic consistency—overcoming the limitations of conventional text-prompt-only attacks. Contribution/Results: Our method exposes the fundamental vulnerability of current MU mechanisms under cross-modal conditions. Extensive experiments across 10 state-of-the-art unlearning methods and diverse downstream tasks demonstrate its superior efficacy over existing baselines. It establishes a novel evaluation paradigm and benchmarking tool for assessing and enhancing the robustness of IGMs against multimodal forgetting failures.

Technology Category

Application Category

📝 Abstract

Recent advances in image generation models (IGMs), particularly diffusion-based architectures such as Stable Diffusion (SD), have markedly enhanced the quality and diversity of AI-generated visual content. However, their generative capability has also raised significant ethical, legal, and societal concerns, including the potential to produce harmful, misleading, or copyright-infringing content. To mitigate these concerns, machine unlearning (MU) emerges as a promising solution by selectively removing undesirable concepts from pretrained models. Nevertheless, the robustness and effectiveness of existing unlearning techniques remain largely unexplored, particularly in the presence of multi-modal adversarial inputs. To bridge this gap, we propose Recall, a novel adversarial framework explicitly designed to compromise the robustness of unlearned IGMs. Unlike existing approaches that predominantly rely on adversarial text prompts, Recall exploits the intrinsic multi-modal conditioning capabilities of diffusion models by efficiently optimizing adversarial image prompts with guidance from a single semantically relevant reference image. Extensive experiments across ten state-of-the-art unlearning methods and diverse tasks show that Recall consistently outperforms existing baselines in terms of adversarial effectiveness, computational efficiency, and semantic fidelity with the original textual prompt. These findings reveal critical vulnerabilities in current unlearning mechanisms and underscore the need for more robust solutions to ensure the safety and reliability of generative models. Code and data are publicly available at extcolor{blue}{https://github.com/ryliu68/RECALL}.

Problem

Research questions and friction points this paper is trying to address.

Assessing robustness of unlearned image generation models

Exploring vulnerabilities in multi-modal adversarial attacks

Evaluating effectiveness of machine unlearning techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal adversarial framework for unlearning robustness

Optimizes adversarial image prompts with reference image

Outperforms baselines in effectiveness and efficiency

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

PhD – Generative Models for Closed-loop Synthesis

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)