h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenging problem of training-free image editing with pre-trained diffusion models. Methodologically, it formulates editing as a reverse-time diffusion bridge process converging to a target distribution at $t=0$, enabling synchronous integration of text instructions and differentiable reward signals for the first time. It introduces a decomposable update mechanism grounded in the Doob $h$-transform, supporting flexible composition of multiple editing constraints, and incorporates diffusion inversion and Langevin sampling to enhance trajectory controllability. Experimental results demonstrate that the proposed framework comprehensively surpasses existing state-of-the-art methods in editing accuracy, content fidelity, and capability for complex multi-objective editing—while requiring no fine-tuning or additional training.

Technology Category

Application Category

📝 Abstract

We introduce a theoretical framework for diffusion-based image editing by formulating it as a reverse-time bridge modeling problem. This approach modifies the backward process of a pretrained diffusion model to construct a bridge that converges to an implicit distribution associated with the editing target at time 0. Building on this framework, we propose h-Edit, a novel editing method that utilizes Doob's h-transform and Langevin Monte Carlo to decompose the update of an intermediate edited sample into two components: a"reconstruction"term and an"editing"term. This decomposition provides flexibility, allowing the reconstruction term to be computed via existing inversion techniques and enabling the combination of multiple editing terms to handle complex editing tasks. To our knowledge, h-Edit is the first training-free method capable of performing simultaneous text-guided and reward-model-based editing. Extensive experiments, both quantitative and qualitative, show that h-Edit outperforms state-of-the-art baselines in terms of editing effectiveness and faithfulness. Our source code is available at https://github.com/nktoan/h-edit.

Problem

Research questions and friction points this paper is trying to address.

Develops a diffusion-based image editing framework using reverse-time bridge modeling.

Proposes h-Edit method for flexible image editing via Doob's h-transform and Langevin Monte Carlo.

Enables simultaneous text-guided and reward-model-based editing without training.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Doob's h-transform for image editing.

Decomposes editing into reconstruction and editing terms.

Enables simultaneous text-guided and reward-model-based editing.

🔎 Similar Papers

No similar papers found.