Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in editing 3D Gaussian Splatting (3DGS) scenes—including difficulty in local region editing, inaccurate semantic localization, and view inconsistency—this paper proposes an iterative editing framework guided by 2D priors. Specifically, it leverages 2D diffusion models for pixel-accurate edit region localization, integrates inverse rendering with foundation-model-predicted depth maps for 3D spatial mapping, initializes a view-consistent coarse geometry, and jointly optimizes Gaussian primitives’ geometry and appearance. By bypassing low-fidelity 3D semantic parsing and instead harnessing strong 2D priors to guide 3D editing, the method achieves high fidelity and efficiency while preserving cross-view consistency. Experiments demonstrate state-of-the-art performance across diverse scenes: editing speed improves by 4× over existing approaches, with superior detail preservation and geometric coherence.

Technology Category

Application Category

📝 Abstract
Many 3D scene editing tasks focus on modifying local regions rather than the entire scene, except for some global applications like style transfer, and in the context of 3D Gaussian Splatting (3DGS), where scenes are represented by a series of Gaussians, this structure allows for precise regional edits, offering enhanced control over specific areas of the scene; however, the challenge lies in the fact that 3D semantic parsing often underperforms compared to its 2D counterpart, making targeted manipulations within 3D spaces more difficult and limiting the fidelity of edits, which we address by leveraging 2D diffusion editing to accurately identify modification regions in each view, followed by inverse rendering for 3D localization, then refining the frontal view and initializing a coarse 3DGS with consistent views and approximate shapes derived from depth maps predicted by a 2D foundation model, thereby supporting an iterative, view-consistent editing process that gradually enhances structural details and textures to ensure coherence across perspectives. Experiments demonstrate that our method achieves state-of-the-art performance while delivering up to a $4 imes$ speedup, providing a more efficient and effective approach to 3D scene local editing.
Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D Gaussian Splatting for precise regional edits
Overcoming 3D semantic parsing limitations using 2D priors
Achieving view-consistent 3D scene editing with improved efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging 2D diffusion for precise region identification
Using inverse rendering for 3D localization
Iterative view-consistent editing for structural refinement
🔎 Similar Papers
No similar papers found.
L
Lanqing Guo
The University of Texas at Austin, Austin, USA.
Y
Yufei Wang
Snap Research, NYC, USA.
Hezhen Hu
Hezhen Hu
University of Texas at Austin
Sign Language RecognitionSign Language TranslationVideo Understanding
Y
Yan Zheng
The University of Texas at Austin, Austin, USA.
Yeying Jin
Yeying Jin
Tencent | National University of Singapore
Computer VisionAIGCGenAIMLLMVLM
Siyu Huang
Siyu Huang
Assistant Professor, Clemson University
computer visionmachine learninggenerative models
Z
Zhangyang Wang
The University of Texas at Austin, Austin, USA.