DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In diffusion-based image editing, reconciling reconstruction fidelity with editing flexibility remains challenging due to the inherent trade-off in latent-space inversion between semantic alignment and structural consistency. To address this, we propose a dual-conditional inversion framework that jointly leverages the source text prompt and a reference image to guide latent optimization. We introduce a novel dual-conditional fixed-point optimization mechanism, enabling coordinated anchoring of the inversion trajectory across both semantic and visual spaces. Furthermore, we formulate inversion as a joint minimization problem of noise discrepancy and reconstruction error. Our method integrates conditional guidance, fixed-point iteration, and multi-objective loss optimization. Extensive experiments demonstrate state-of-the-art performance across diverse image editing benchmarks, achieving significant improvements in reconstruction fidelity and editing precision. Notably, our approach also exhibits strong robustness and generalization in pure reconstruction tasks.

Technology Category

Application Category

📝 Abstract
Diffusion models have achieved remarkable success in image generation and editing tasks. Inversion within these models aims to recover the latent noise representation for a real or generated image, enabling reconstruction, editing, and other downstream tasks. However, to date, most inversion approaches suffer from an intrinsic trade-off between reconstruction accuracy and editing flexibility. This limitation arises from the difficulty of maintaining both semantic alignment and structural consistency during the inversion process. In this work, we introduce Dual-Conditional Inversion (DCI), a novel framework that jointly conditions on the source prompt and reference image to guide the inversion process. Specifically, DCI formulates the inversion process as a dual-condition fixed-point optimization problem, minimizing both the latent noise gap and the reconstruction error under the joint guidance. This design anchors the inversion trajectory in both semantic and visual space, leading to more accurate and editable latent representations. Our novel setup brings new understanding to the inversion process. Extensive experiments demonstrate that DCI achieves state-of-the-art performance across multiple editing tasks, significantly improving both reconstruction quality and editing precision. Furthermore, we also demonstrate that our method achieves strong results in reconstruction tasks, implying a degree of robustness and generalizability approaching the ultimate goal of the inversion process.
Problem

Research questions and friction points this paper is trying to address.

Balancing reconstruction accuracy and editing flexibility in diffusion inversion
Maintaining semantic alignment and structural consistency during inversion
Improving latent representation accuracy and editability for image editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-condition inversion for image editing
Joint source prompt and reference guidance
Fixed-point optimization for latent noise
🔎 Similar Papers
No similar papers found.
Zixiang Li
Zixiang Li
Beijing Jiaotong University
H
Haoyu Wang
Institute of Information Science, Beijing Jiaotong University, Visual Intelligence +X International Cooperation Joint Laboratory of MOE
W
Wei Wang
Institute of Information Science, Beijing Jiaotong University, Visual Intelligence +X International Cooperation Joint Laboratory of MOE
C
Chuangchuang Tan
Institute of Information Science, Beijing Jiaotong University, Visual Intelligence +X International Cooperation Joint Laboratory of MOE
Yunchao Wei
Yunchao Wei
Professor, Beijing Jiaotong University, UTS, UIUC, NUS
Computer VisionMachine Learning
Y
Yao Zhao
Institute of Information Science, Beijing Jiaotong University, Visual Intelligence +X International Cooperation Joint Laboratory of MOE