Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Pretrained 2D image editing models suffer from inter-view inconsistency in multi-view editing. Existing explicit 3D optimization-based approaches incur high computational overhead and exhibit instability under sparse-view conditions. This paper proposes a training-free, plug-and-play framework for inference-time multi-view consistent editing. By coupling diffusion sampling across views, it jointly models the multi-view image distribution and editing objectives, imposing implicit 3D regularization on pretrained 2D editors—using synchronized cross-view sampling as a geometric consistency constraint, thereby bypassing explicit 3D reconstruction. The method requires only a single forward sampling pass, significantly improving both geometric coherence and visual fidelity of edited results. We validate its generality and architecture-agnosticism across three distinct multi-view editing tasks. Our approach establishes a new paradigm for efficient and robust 3D-aware image editing.

Technology Category

Application Category

📝 Abstract

We present an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models. These models can independently produce high-quality edits for each image in a set of multi-view images of a 3D scene or object, but they do not maintain consistency across views. Existing approaches typically address this by optimizing over explicit 3D representations, but they suffer from a lengthy optimization process and instability under sparse view settings. We propose an implicit 3D regularization approach by constraining the generated 2D image sequences to adhere to a pre-trained multi-view image distribution. This is achieved through coupled diffusion sampling, a simple diffusion sampling technique that concurrently samples two trajectories from both a multi-view image distribution and a 2D edited image distribution, using a coupling term to enforce the multi-view consistency among the generated images. We validate the effectiveness and generality of this framework on three distinct multi-view image editing tasks, demonstrating its applicability across various model architectures and highlighting its potential as a general solution for multi-view consistent editing.

Problem

Research questions and friction points this paper is trying to address.

Achieving multi-view consistency in image editing

Eliminating lengthy optimization in 3D representation methods

Maintaining consistency under sparse view settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free multi-view editing via diffusion sampling

Implicit 3D regularization using coupled sampling

Enforcing consistency through multi-view distribution constraints

🔎 Similar Papers

No similar papers found.