EditInfinity: Image Editing with Binary-Quantized Generative Models

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-guided image editing methods based on diffusion models suffer from significant approximation errors during image inversion due to the lack of precise supervision over intermediate representations, thereby compromising editing fidelity and text-image alignment. To address this, we propose EditInfinity—a novel framework that introduces binary-quantized generative modeling (specifically, the Infinity model built upon VQ-VAE) into text-guided editing for the first time, enabling exact supervision of intermediate latent representations during inversion. EditInfinity incorporates a text-guided inversion mechanism, a prompt correction module, a style preservation strategy, and multi-level smoothing optimization. With minimal parameter overhead, it achieves efficient and high-fidelity editing. On the PIE-Bench benchmark across “add/modify/remove” tasks, EditInfinity consistently outperforms state-of-the-art diffusion-based methods, markedly improving both visual fidelity and semantic consistency with input prompts.

Technology Category

Application Category

📝 Abstract
Adapting pretrained diffusion-based generative models for text-driven image editing with negligible tuning overhead has demonstrated remarkable potential. A classical adaptation paradigm, as followed by these methods, first infers the generative trajectory inversely for a given source image by image inversion, then performs image editing along the inferred trajectory guided by the target text prompts. However, the performance of image editing is heavily limited by the approximation errors introduced during image inversion by diffusion models, which arise from the absence of exact supervision in the intermediate generative steps. To circumvent this issue, we investigate the parameter-efficient adaptation of VQ-based generative models for image editing, and leverage their inherent characteristic that the exact intermediate quantized representations of a source image are attainable, enabling more effective supervision for precise image inversion. Specifically, we propose emph{EditInfinity}, which adapts emph{Infinity}, a binary-quantized generative model, for image editing. We propose an efficient yet effective image inversion mechanism that integrates text prompting rectification and image style preservation, enabling precise image inversion. Furthermore, we devise a holistic smoothing strategy which allows our emph{EditInfinity} to perform image editing with high fidelity to source images and precise semantic alignment to the text prompts. Extensive experiments on the PIE-Bench benchmark across "add", "change", and "delete" editing operations, demonstrate the superior performance of our model compared to state-of-the-art diffusion-based baselines. Code available at: https://github.com/yx-chen-ust/EditInfinity.
Problem

Research questions and friction points this paper is trying to address.

Overcoming diffusion model inversion errors in image editing tasks
Enabling precise image inversion through quantized representations
Achieving high-fidelity text-driven editing with minimal tuning overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary-quantized generative models enable precise image inversion
Text prompting rectification preserves image style during editing
Holistic smoothing strategy ensures high fidelity and semantic alignment
🔎 Similar Papers
No similar papers found.
Jiahuan Wang
Jiahuan Wang
National University of Defense Technology
Machine Learning
Y
Yuxin Chen
The Hong Kong University of Science and Technology
J
Jun Yu
Harbin Institute of Technology, Shenzhen
Guangming Lu
Guangming Lu
Harbin Institute of Technology, Shenzhen
Computer VisionMachine Learning
W
Wenjie Pei
Harbin Institute of Technology, Shenzhen