π€ AI Summary
This work addresses a critical security vulnerability in existing concept erasure methods for diffusion models, where supposedly removed sensitive concepts can still be reactivated. By modeling the generative process as an implicit function, the study systematically analyzes how text conditions, model parameters, and latent states jointly influence concept reactivation. The authors propose a multi-concept synchronous reactivation method that reconstructs sampling trajectories in latent space, reestablishes textβvisual associations through a semantic rebinding mechanism, and mitigates gradient conflicts and feature entanglement among multiple concepts via gradient field orthogonalization. Furthermore, they introduce Latent Semantic Identification-guided Sampling (LSIS) to ensure stable reactivation. Experiments demonstrate that the method faithfully recovers multiple erased concepts simultaneously across diverse erasure tasks and model architectures, confirming its effectiveness and robustness.
π Abstract
Concept erasure aims to suppress sensitive content in diffusion models, but recent studies show that erased concepts can still be reawakened, revealing vulnerabilities in erasure methods. Existing reawakening methods mainly rely on prompt-level optimization to manipulate sampling trajectories, neglecting other generative factors, which limits a comprehensive understanding of the underlying dynamics. In this paper, we model the generation process as an implicit function to enable a comprehensive theoretical analysis of multiple factors, including text conditions, model parameters, and latent states. We theoretically show that perturbing each factor can reawaken erased concepts. Building on this insight, we propose a novel concept reawakening method: Latent space Unblocking for concept REawakening (LURE), which reawakens erased concepts by reconstructing the latent space and guiding the sampling trajectory. Specifically, our semantic re-binding mechanism reconstructs the latent space by aligning denoising predictions with target distributions to reestablish severed text-visual associations. However, in multi-concept scenarios, naive reconstruction can cause gradient conflicts and feature entanglement. To address this, we introduce Gradient Field Orthogonalization, which enforces feature orthogonality to prevent mutual interference. Additionally, our Latent Semantic Identification-Guided Sampling (LSIS) ensures stability of the reawakening process via posterior density verification. Extensive experiments demonstrate that LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods.