3D-Consistent Multi-View Editing by Diffusion Guidance

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models often yield geometrically and photometrically inconsistent edits across multiple views, degrading the quality of downstream 3D reconstruction methods such as NeRF and 3D Gaussian Splatting. To address this, we propose a training-free multi-view co-editing framework. Our method introduces a novel correspondence-aware transformation consistency loss, jointly optimizing diffusion-guided sampling across views to enforce structural and appearance coherence. It accommodates both dense and sparse input view configurations and seamlessly integrates with diverse 3D representations—including NeRF and 3D Gaussian Splatting. To our knowledge, this is the first training-free approach achieving high 3D-consistent multi-view editing, significantly improving geometric fidelity, texture sharpness, and text-alignment accuracy of edited results. Extensive video experiments demonstrate robustness and practicality under dynamic camera trajectories.

Technology Category

Application Category

📝 Abstract
Recent advancements in diffusion models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian Splat models. We propose a training-free diffusion framework that enforces multi-view consistency during the image editing process. The key assumption is that corresponding points in the unedited images should undergo similar transformations after editing. To achieve this, we introduce a consistency loss that guides the diffusion sampling toward coherent edits. The framework is flexible and can be combined with widely varying image editing methods, supporting both dense and sparse multi-view editing setups. Experimental results show that our approach significantly improves 3D consistency compared to existing multi-view editing methods. We also show that this increased consistency enables high-quality Gaussian Splat editing with sharp details and strong fidelity to user-specified text prompts. Please refer to our project page for video results: https://3d-consistent-editing.github.io/
Problem

Research questions and friction points this paper is trying to address.

Ensures multi-view consistency in 3D scene editing
Guides diffusion sampling with a consistency loss
Improves editing for NeRFs and Gaussian Splat models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free diffusion framework enforces multi-view consistency
Consistency loss guides diffusion sampling for coherent edits
Flexible framework supports dense and sparse multi-view editing setups
🔎 Similar Papers
2024-03-18European Conference on Computer VisionCitations: 16