View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the challenge of multi-view geometric and semantic inconsistency in text-driven 3D scene editing by proposing a consistency-aware editing framework based on cross-view joint distribution modeling. The approach employs a dual-path mechanism that jointly optimizes structural correspondence and semantic continuity: a projection-guided structural constraint enforces geometric alignment, while block-level semantic propagation preserves cross-view semantic coherence. To enable supervised training, the authors introduce the first paired multi-view 3D editing dataset and integrate differentiable rendering with implicit 3D representations for end-to-end optimization. Experiments demonstrate that the method significantly outperforms existing techniques on complex scenes, achieving high-fidelity and view-consistent text-driven editing results.

Technology Category

Application Category

📝 Abstract

Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to optimize the underlying 3D representation. However, cross-view inconsistency remains a major bottleneck. Although recent methods introduce geometric cues, cross-view interactions, or video priors to mitigate this issue, they still largely rely on inference-time synchronization and thus remain limited in robustness and generalization.In this work, we recast multi-view consistent 3D editing from a distributional perspective: 3D scene editing essentially requires a joint distribution modeling across viewpoints.Based on this insight, we propose a view-consistent 3D editing framework that explicitly introduces cross-view dependencies into the editing process. Furthermore, motivated by the observation that structural correspondence and semantic continuity rely on different cross-view cues, we introduce a dual-path consistency mechanism consisting of projection-guided structural guidance and patch-level semantic propagation for effective cross-view editing. Further, we construct a paired multi-view editing dataset that provides reliable supervision for learning cross-view consistency in edited scenes. Extensive experiments demonstrate that our method achieves superior editing performance with precise and consistent views for complex scenes.

Problem

Research questions and friction points this paper is trying to address.

view consistency

3D scene editing

cross-view inconsistency

semantic continuity

structural correspondence

Innovation

Methods, ideas, or system contributions that make the work stand out.

view-consistent 3D editing

dual-path consistency

structural correspondence