🤖 AI Summary
To address the challenge of achieving fine-grained controllability in diffusion-based image generation without task-specific training, this paper proposes a training-free cross-attention guidance framework that uniformly supports dynamic intervention for both concrete (e.g., objects) and abstract (e.g., styles) concepts. Our method constructs concept-guidance vectors from offline-computed average implicit representations and injects them into cross-attention layers during inference via a dynamic heuristic mechanism, enabling precise concept addition, removal, or replacement. Extensive experiments across multiple mainstream diffusion models demonstrate that our approach significantly outperforms existing state-of-the-art methods. It achieves high fidelity and spatial consistency in diverse tasks—including harmful content removal, attribute addition, object replacement, and style transfer—while introducing minimal side effects. This work establishes a new paradigm for efficient, general-purpose, zero-shot controllable generation, with direct applicability to content moderation and creative customization.
📝 Abstract
Diffusion models have transformed image generation, yet controlling their outputs for diverse applications, including content moderation and creative customization, remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering) a training-free framework for controllable image generation using steering vectors to influence a diffusion model$'$s hidden representations dynamically. CASteer computes these vectors offline by averaging activations from concept-specific generated images, then applies them during inference via a dynamic heuristic that activates modifications only when necessary, removing concepts from affected images or adding them to unaffected ones. This approach enables precise control over a wide range of tasks, including removing harmful content, adding desired attributes, replacing objects, or altering styles, all without model retraining. CASteer handles both concrete and abstract concepts, outperforming state-of-the-art techniques across multiple diffusion models while preserving unrelated content and minimizing unintended effects.