PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of jointly achieving controllability, background fidelity, and inference efficiency in zero-shot image editing, this paper proposes a posterior sampling framework that requires neither latent code inversion nor model fine-tuning. The core innovation introduces a novel measurement term—integrating initial image features with Langevin dynamics—explicitly constraining semantic controllability in edited regions and pixel-level fidelity in unedited regions during diffusion sampling. The method combines prompt-driven guidance, feature-consistency optimization, and theoretically grounded Langevin steering, ensuring zero training and zero inversion. Experiments demonstrate that our approach achieves state-of-the-art editing quality while generating each image in approximately 1.5 seconds on an 18GB GPU. Crucially, structural and textural fidelity in unedited regions significantly surpasses existing methods.

Technology Category

Application Category

📝 Abstract
In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve the features of the initial images, which results in low efficiency due to the requirement for extensive network inference. Conversely, inversion-free methods lack theoretical support for background similarity, as they circumvent the issue of maintaining initial features to achieve efficiency. As a consequence, none of these methods can achieve both high efficiency and background consistency. To tackle the challenges and the aforementioned disadvantages, we introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process. Specifically, a corresponding measurement term related to both the initial features and Langevin dynamics is introduced to optimize the estimated image generated by the given target prompt. Extensive experimental results indicate that the proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions. Furthermore, the method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
Problem

Research questions and friction points this paper is trying to address.

Image Modification
Background Preservation
Efficiency Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

PostEdit
Efficient Image Modification
Background Stability
🔎 Similar Papers
No similar papers found.
F
Feng Tian
Shanghai Jiao Tong University
Y
Yixuan Li
Shanghai Jiao Tong University
Y
Yichao Yan
Shanghai Jiao Tong University
S
Shanyan Guan
vivo Mobile Communication Co., Ltd
Y
Yanhao Ge
vivo Mobile Communication Co., Ltd
X
Xiaokang Yang
Shanghai Jiao Tong University