Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing concept-erasure methods for text-to-image (T2I) and text-to-video (T2V) models suffer from reliance on retraining, high inference overhead, poor robustness, and limited cross-modal generalization; critically, they ignore semantic overlap between target concepts and contextual features, leading to collateral damage. This work proposes a training-free, single-weight-modification approach for instantaneous concept erasure. Leveraging anisotropic energy-weighted scaling and a closed-form overlap projection operator, it achieves the first precise cross-modal semantic decoupling and intersection regularization for both T2I and T2V. The method formulates a convex, Lipschitz-bounded forgetting objective via spectral analysis, integrated with latent-space subspace modeling and explicit mapping of text-conditioning layers. It enables one-shot, permanent, zero-overhead concept forgetting. Extensive evaluation across multiple foundation models demonstrates strong erasure efficacy, high robustness against adversarial perturbations, and preservation of original generation fidelity.

Technology Category

Application Category

📝 Abstract
Robust concept removal for text-to-image (T2I) and text-to-video (T2V) models is essential for their safe deployment. Existing methods, however, suffer from costly retraining, inference overhead, or vulnerability to adversarial attacks. Crucially, they rarely model the latent semantic overlap between the target erase concept and surrounding content -- causing collateral damage post-erasure -- and even fewer methods work reliably across both T2I and T2V domains. We introduce Instant Concept Erasure (ICE), a training-free, modality-agnostic, one-shot weight modification approach that achieves precise, persistent unlearning with zero overhead. ICE defines erase and preserve subspaces using anisotropic energy-weighted scaling, then explicitly regularises against their intersection using a unique, closed-form overlap projector. We pose a convex and Lipschitz-bounded Spectral Unlearning Objective, balancing erasure fidelity and intersection preservation, that admits a stable and unique analytical solution. This solution defines a dissociation operator that is translated to the model's text-conditioning layers, making the edit permanent and runtime-free. Across targeted removals of artistic styles, objects, identities, and explicit content, ICE efficiently achieves strong erasure with improved robustness to red-teaming, all while causing only minimal degradation of original generative abilities in both T2I and T2V models.
Problem

Research questions and friction points this paper is trying to address.

Achieving robust concept removal in text-to-image and video models
Addressing semantic overlap issues to prevent collateral damage
Developing training-free erasure method working across multiple modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free weight modification for concept erasure
Anisotropic energy-weighted scaling defines subspaces
Analytical solution enables permanent runtime-free edits
🔎 Similar Papers
No similar papers found.