Stable Flow: Vital Layers for Training-Free Image Editing

📅 2024-11-21

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

198K/year

🤖 AI Summary

This work addresses inconsistent editing in Diffusion Transformers (DiTs) caused by the absence of hierarchical compositional structure. We propose a training-free, general-purpose image editing framework. Our method systematically identifies— for the first time—the “vital layers” in DiTs that critically govern generation, and enables selective attention feature injection guided by automated layer importance assessment. Additionally, we introduce an improved image inversion technique tailored for flow-matching models, enabling unified non-rigid deformation and object insertion. Extensive experiments demonstrate significant improvements over state-of-the-art baselines across qualitative evaluation, quantitative metrics (e.g., LPIPS, CLIP-Score), and user studies, while supporting high-fidelity editing of real-world images. The code and project page are publicly available.

Technology Category

Application Category

📝 Abstract

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. However, they exhibit limited generation diversity. In this work, we leverage this limitation to perform consistent image edits via selective injection of attention features. The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection. Therefore, we propose an automatic method to identify"vital layers"within DiT, crucial for image formation, and demonstrate how these layers facilitate a range of controlled stable edits, from non-rigid modifications to object addition, using the same mechanism. Next, to enable real-image editing, we introduce an improved image inversion method for flow models. Finally, we evaluate our approach through qualitative and quantitative comparisons, along with a user study, and demonstrate its effectiveness across multiple applications. The project page is available at https://omriavrahami.com/stable-flow

Problem

Research questions and friction points this paper is trying to address.

Identify vital layers in Diffusion Transformer for image editing

Enable consistent image edits via selective attention feature injection

Improve image inversion method for real-image editing in flow models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective attention feature injection in DiT

Automatic identification of vital layers

Improved image inversion for flow models

🔎 Similar Papers

RealCraft: Attention Control as A Tool for Zero-Shot Consistent Video Editing