Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

To address the challenge of mitigating harmful content in text-to-image generation, existing concept erasure methods suffer from visual artifacts and reliance on manually selected anchor concepts. This paper proposes ANT, a novel framework that introduces, for the first time, a trajectory-aware inverse conditional guidance objective within classifier-free guidance to enable automatic avoidance of harmful concepts. We design an enhanced weight saliency map for precise single-concept erasure and develop a parameter-free, plug-and-play mechanism for multi-concept collaborative erasure. ANT integrates conditional guidance analysis of diffusion models, trajectory-aware loss optimization, parameter-level saliency assessment, and data-augmentation-driven localization of sensitive parameters. Experiments demonstrate that ANT achieves state-of-the-art performance on both single- and multi-concept erasure tasks, effectively suppressing harmful content generation while preserving high image fidelity and structural integrity without introducing noticeable artifacts.

Technology Category

Application Category

📝 Abstract

Ensuring the ethical deployment of text-to-image models requires effective techniques to prevent the generation of harmful or inappropriate content. While concept erasure methods offer a promising solution, existing finetuning-based approaches suffer from notable limitations. Anchor-free methods risk disrupting sampling trajectories, leading to visual artifacts, while anchor-based methods rely on the heuristic selection of anchor concepts. To overcome these shortcomings, we introduce a finetuning framework, dubbed ANT, which Automatically guides deNoising Trajectories to avoid unwanted concepts. ANT is built on a key insight: reversing the condition direction of classifier-free guidance during mid-to-late denoising stages enables precise content modification without sacrificing early-stage structural integrity. This inspires a trajectory-aware objective that preserves the integrity of the early-stage score function field, which steers samples toward the natural image manifold, without relying on heuristic anchor concept selection. For single-concept erasure, we propose an augmentation-enhanced weight saliency map to precisely identify the critical parameters that most significantly contribute to the unwanted concept, enabling more thorough and efficient erasure. For multi-concept erasure, our objective function offers a versatile plug-and-play solution that significantly boosts performance. Extensive experiments demonstrate that ANT achieves state-of-the-art results in both single and multi-concept erasure, delivering high-quality, safe outputs without compromising the generative fidelity. Code is available at https://github.com/lileyang1210/ANT

Problem

Research questions and friction points this paper is trying to address.

Prevent harmful content in text-to-image models

Overcome limitations in concept erasure methods

Automate denoising to avoid unwanted concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically guides denoising trajectories to avoid unwanted concepts

Uses augmentation-enhanced weight saliency for precise parameter identification

Offers versatile plug-and-play solution for multi-concept erasure

🔎 Similar Papers

Addressing and Visualizing Misalignments in Human Task-Solving Trajectories