Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Addressing challenges in editing 3D Gaussian Splatting (3DGS) scenes—including difficulty in local region editing, inaccurate semantic localization, and view inconsistency—this paper proposes an iterative editing framework guided by 2D priors. Specifically, it leverages 2D diffusion models for pixel-accurate edit region localization, integrates inverse rendering with foundation-model-predicted depth maps for 3D spatial mapping, initializes a view-consistent coarse geometry, and jointly optimizes Gaussian primitives’ geometry and appearance. By bypassing low-fidelity 3D semantic parsing and instead harnessing strong 2D priors to guide 3D editing, the method achieves high fidelity and efficiency while preserving cross-view consistency. Experiments demonstrate state-of-the-art performance across diverse scenes: editing speed improves by 4× over existing approaches, with superior detail preservation and geometric coherence.

Technology Category

Application Category

📝 Abstract

Many 3D scene editing tasks focus on modifying local regions rather than the entire scene, except for some global applications like style transfer, and in the context of 3D Gaussian Splatting (3DGS), where scenes are represented by a series of Gaussians, this structure allows for precise regional edits, offering enhanced control over specific areas of the scene; however, the challenge lies in the fact that 3D semantic parsing often underperforms compared to its 2D counterpart, making targeted manipulations within 3D spaces more difficult and limiting the fidelity of edits, which we address by leveraging 2D diffusion editing to accurately identify modification regions in each view, followed by inverse rendering for 3D localization, then refining the frontal view and initializing a coarse 3DGS with consistent views and approximate shapes derived from depth maps predicted by a 2D foundation model, thereby supporting an iterative, view-consistent editing process that gradually enhances structural details and textures to ensure coherence across perspectives. Experiments demonstrate that our method achieves state-of-the-art performance while delivering up to a $4 imes$ speedup, providing a more efficient and effective approach to 3D scene local editing.

Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D Gaussian Splatting for precise regional edits

Overcoming 3D semantic parsing limitations using 2D priors

Achieving view-consistent 3D scene editing with improved efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging 2D diffusion for precise region identification

Using inverse rendering for 3D localization

Iterative view-consistent editing for structural refinement

🔎 Similar Papers

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians