Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the challenge of erasing harmful concepts from text-to-image diffusion models while preserving both semantic consistency and structural integrity. To this end, the authors propose PAIR, a novel framework that reformulates concept erasure as a safe replacement task by constructing paired unsafe–safe inputs and enforcing semantic alignment between them. The method introduces a paired semantic realignment loss and a Fisher-weighted low-rank adaptation (DoRA) initialization mechanism, enabling fine-grained yet coherent concept editing. Experimental results demonstrate that PAIR effectively removes undesirable content while significantly outperforming existing approaches in terms of image structural preservation, semantic coherence, and overall generation quality.

Technology Category

Application Category

📝 Abstract

With the increasing versatility of text-to-image diffusion models, the ability to selectively erase undesirable concepts (e.g., harmful content) has become indispensable. However, existing concept erasure approaches primarily focus on removing unsafe concepts without providing guidance toward corresponding safe alternatives, which often leads to failure in preserving the structural and semantic consistency between the original and erased generations. In this paper, we propose a novel framework, PAIRed Erasing (PAIR), which reframes concept erasure from simple removal to consistency-preserving semantic realignment using unsafe-safe pairs. We first generate safe counterparts from unsafe inputs while preserving structural and semantic fidelity, forming paired unsafe-safe multimodal data. Leveraging these pairs, we introduce two key components: (1) Paired Semantic Realignment, a guided objective that uses unsafe-safe pairs to explicitly map target concepts to semantically aligned safe anchors; and (2) Fisher-weighted Initialization for DoRA, which initializes parameter-efficient low-rank adaptation matrices using unsafe-safe pairs, encouraging the generation of safe alternatives while selectively suppressing unsafe concepts. Together, these components enable fine-grained erasure that removes only the targeted concepts while maintaining overall semantic consistency. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving effective concept erasure while preserving structural integrity, semantic coherence, and generation quality.

Problem

Research questions and friction points this paper is trying to address.

concept erasure

semantic consistency

text-to-image diffusion models

unsafe-safe pairing

harmful content

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept erasure

unsafe-safe pairing

semantic realignment