CoreEditor: Consistent 3D Editing via Correspondence-constrained Diffusion

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-driven 3D editing methods suffer from incomplete edits and blurred details due to insufficient modeling of multi-view consistency. To address this, we propose a correspondence-constrained attention mechanism that explicitly models cross-view pixel correspondences during diffusion denoising, jointly leveraging geometric alignment and semantic similarity estimation to enforce robust multi-view consistency. Additionally, we introduce a selective editing pipeline to enhance user controllability and output flexibility. Our method operates directly on pretrained 2D diffusion models, taking multi-view images as input without requiring 3D priors or model fine-tuning. Extensive experiments demonstrate that our approach generates high-fidelity, view-consistent results with sharp geometric and semantic details across diverse text-driven 3D editing tasks, significantly outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Text-driven 3D editing seeks to modify 3D scenes according to textual descriptions, and most existing approaches tackle this by adapting pre-trained 2D image editors to multi-view inputs. However, without explicit control over multi-view information exchange, they often fail to maintain cross-view consistency, leading to insufficient edits and blurry details. We introduce CoreEditor, a novel framework for consistent text-to-3D editing. The key innovation is a correspondence-constrained attention mechanism that enforces precise interactions between pixels expected to remain consistent throughout the diffusion denoising process. Beyond relying solely on geometric alignment, we further incorporate semantic similarity estimated during denoising, enabling more reliable correspondence modeling and robust multi-view editing. In addition, we design a selective editing pipeline that allows users to choose preferred results from multiple candidates, offering greater flexibility and user control. Extensive experiments show that CoreEditor produces high-quality, 3D-consistent edits with sharper details, significantly outperforming prior methods.
Problem

Research questions and friction points this paper is trying to address.

Ensures cross-view consistency in 3D scene editing
Improves multi-view editing via semantic-geometric correspondence
Enables user-selected flexible editing from multiple candidates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Correspondence-constrained attention mechanism for consistency
Semantic similarity enhances reliable correspondence modeling
Selective editing pipeline for user flexibility
🔎 Similar Papers
2024-03-18European Conference on Computer VisionCitations: 16
Z
Zhe Zhu
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Honghua Chen
Honghua Chen
Research Assistant Professor, Lingnan University, Hong Kong
3D Measurement/Vision3D GenerationDeep Geometry Learning
P
Peng Li
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Mingqiang Wei
Mingqiang Wei
Professor at Nanjing University of Aeronautics and Astronautics
3D VisionMultimodal FusionComputer GraphicsDeep Geometry LearningCAD