🤖 AI Summary
This work addresses the challenge of preserving pixel-level edge structures in image editing based on latent diffusion models, a limitation that compromises photorealism in tasks such as style transfer and color adjustment. To this end, the authors propose a training-free Structural Preservation Loss (SPL) that quantifies structural discrepancies between input and edited images using a local linear model, which is directly integrated into the diffusion inference process to retain fine edge details. The approach is further enhanced by combining SPL with post-decoding refinement, an editing-region mask, and a color preservation loss, collectively improving overall fidelity. Notably, this is the first method to incorporate local linear structural constraints into diffusion model inference, achieving state-of-the-art performance in structure preservation across multiple editing tasks within the latent diffusion framework.
📝 Abstract
Recent advances in image editing leverage latent diffusion models (LDMs) for versatile, text-prompt-driven edits across diverse tasks. Yet, maintaining pixel-level edge structures-crucial for tasks such as photorealistic style transfer or image tone adjustment-remains as a challenge for latent-diffusion-based editing. To overcome this limitation, we propose a novel Structure Preservation Loss (SPL) that leverages local linear models to quantify structural differences between input and edited images. Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity. This core mechanism is complemented by a post-processing step to mitigate LDM decoding distortions, a masking strategy for precise edit localization, and a color preservation loss to preserve hues in unedited areas. Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing. Our code will be publicly released at https://github.com/gongms00/SPL.