View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the challenge of multi-view geometric and semantic inconsistency in text-driven 3D scene editing by proposing a consistency-aware editing framework based on cross-view joint distribution modeling. The approach employs a dual-path mechanism that jointly optimizes structural correspondence and semantic continuity: a projection-guided structural constraint enforces geometric alignment, while block-level semantic propagation preserves cross-view semantic coherence. To enable supervised training, the authors introduce the first paired multi-view 3D editing dataset and integrate differentiable rendering with implicit 3D representations for end-to-end optimization. Experiments demonstrate that the method significantly outperforms existing techniques on complex scenes, achieving high-fidelity and view-consistent text-driven editing results.

Technology Category

Application Category

📝 Abstract
Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to optimize the underlying 3D representation. However, cross-view inconsistency remains a major bottleneck. Although recent methods introduce geometric cues, cross-view interactions, or video priors to mitigate this issue, they still largely rely on inference-time synchronization and thus remain limited in robustness and generalization.In this work, we recast multi-view consistent 3D editing from a distributional perspective: 3D scene editing essentially requires a joint distribution modeling across viewpoints.Based on this insight, we propose a view-consistent 3D editing framework that explicitly introduces cross-view dependencies into the editing process. Furthermore, motivated by the observation that structural correspondence and semantic continuity rely on different cross-view cues, we introduce a dual-path consistency mechanism consisting of projection-guided structural guidance and patch-level semantic propagation for effective cross-view editing. Further, we construct a paired multi-view editing dataset that provides reliable supervision for learning cross-view consistency in edited scenes. Extensive experiments demonstrate that our method achieves superior editing performance with precise and consistent views for complex scenes.
Problem

Research questions and friction points this paper is trying to address.

view consistency
3D scene editing
cross-view inconsistency
semantic continuity
structural correspondence
Innovation

Methods, ideas, or system contributions that make the work stand out.

view-consistent 3D editing
dual-path consistency
structural correspondence
semantic continuity
cross-view dependency
🔎 Similar Papers
2024-03-18European Conference on Computer VisionCitations: 16