LatentEdit: Adaptive Latent Control for Consistent Semantic Editing

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing the challenge of simultaneously achieving high editing quality, background fidelity, and inference efficiency in diffusion-based image editing, this paper proposes LatentEdit—a novel adaptive latent fusion framework. Without modifying model architecture or introducing complex attention mechanisms, LatentEdit enables fine-grained semantic editing by dynamically weighting and fusing the current denoised latent with a reference latent obtained via source image inversion. Furthermore, we design a lightweight inversion-free variant to significantly reduce computational overhead. LatentEdit is architecture-agnostic, supporting both UNet and DiT backbones, and enables plug-and-play deployment. On PIE-Bench, LatentEdit surpasses state-of-the-art methods using only 8–15 sampling steps, achieving optimal trade-offs between fidelity and editability while accelerating inference by 2×—demonstrating strong potential for real-time applications.

Technology Category

Application Category

📝 Abstract

Diffusion-based Image Editing has achieved significant success in recent years. However, it remains challenging to achieve high-quality image editing while maintaining the background similarity without sacrificing speed or memory efficiency. In this work, we introduce LatentEdit, an adaptive latent fusion framework that dynamically combines the current latent code with a reference latent code inverted from the source image. By selectively preserving source features in high-similarity, semantically important regions while generating target content in other regions guided by the target prompt, LatentEdit enables fine-grained, controllable editing. Critically, the method requires no internal model modifications or complex attention mechanisms, offering a lightweight, plug-and-play solution compatible with both UNet-based and DiT-based architectures. Extensive experiments on the PIE-Bench dataset demonstrate that our proposed LatentEdit achieves an optimal balance between fidelity and editability, outperforming the state-of-the-art method even in 8-15 steps. Additionally, its inversion-free variant further halves the number of neural function evaluations and eliminates the need for storing any intermediate variables, substantially enhancing real-time deployment efficiency.

Problem

Research questions and friction points this paper is trying to address.

Achieving high-quality image editing with background consistency

Maintaining speed and memory efficiency during semantic editing

Enabling fine-grained control without complex model modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive latent fusion for dynamic feature combination

Selective preservation of high-similarity semantic regions

Plug-and-play solution without model modifications

🔎 Similar Papers

No similar papers found.

Authors to Follow