CASteer: Steering Diffusion Models for Controllable Generation

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the challenge of achieving fine-grained controllability in diffusion-based image generation without task-specific training, this paper proposes a training-free cross-attention guidance framework that uniformly supports dynamic intervention for both concrete (e.g., objects) and abstract (e.g., styles) concepts. Our method constructs concept-guidance vectors from offline-computed average implicit representations and injects them into cross-attention layers during inference via a dynamic heuristic mechanism, enabling precise concept addition, removal, or replacement. Extensive experiments across multiple mainstream diffusion models demonstrate that our approach significantly outperforms existing state-of-the-art methods. It achieves high fidelity and spatial consistency in diverse tasks—including harmful content removal, attribute addition, object replacement, and style transfer—while introducing minimal side effects. This work establishes a new paradigm for efficient, general-purpose, zero-shot controllable generation, with direct applicability to content moderation and creative customization.

Technology Category

Application Category

📝 Abstract

Diffusion models have transformed image generation, yet controlling their outputs for diverse applications, including content moderation and creative customization, remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering) a training-free framework for controllable image generation using steering vectors to influence a diffusion model$'$s hidden representations dynamically. CASteer computes these vectors offline by averaging activations from concept-specific generated images, then applies them during inference via a dynamic heuristic that activates modifications only when necessary, removing concepts from affected images or adding them to unaffected ones. This approach enables precise control over a wide range of tasks, including removing harmful content, adding desired attributes, replacing objects, or altering styles, all without model retraining. CASteer handles both concrete and abstract concepts, outperforming state-of-the-art techniques across multiple diffusion models while preserving unrelated content and minimizing unintended effects.

Problem

Research questions and friction points this paper is trying to address.

Controlling diffusion models for diverse applications

Generalizing across concrete and abstract concepts

Enabling precise control without model retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for controllable image generation

Dynamic steering vectors influence hidden representations

Handles both concrete and abstract concepts effectively

🔎 Similar Papers

No similar papers found.