🤖 AI Summary
This paper addresses the challenge of simultaneously achieving global consistency, local detail fidelity, and prompt alignment in text- and image-guided 3D scene editing. To this end, we propose an adaptive editing framework built upon 3D Gaussian splatting. Our method comprises three stages: (1) precise region localization driven jointly by text and image prompts; (2) adaptive global–local co-optimization to mitigate the Janus problem—i.e., conflicting optimization objectives between global structure and local details; and (3) category-guided regularization combined with diffusion-based image-to-image translation to enhance semantic coherence and texture realism. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in editing accuracy, visual fidelity, and inference efficiency. It enables high-quality, user-controllable semantic editing of 3D scenes while preserving geometric integrity and appearance consistency across viewpoints.
📝 Abstract
This paper presents GaussEdit, a framework for adaptive 3D scene editing guided by text and image prompts. GaussEdit leverages 3D Gaussian Splatting as its backbone for scene representation, enabling convenient Region of Interest selection and efficient editing through a three-stage process. The first stage involves initializing the 3D Gaussians to ensure high-quality edits. The second stage employs an Adaptive Global-Local Optimization strategy to balance global scene coherence and detailed local edits and a category-guided regularization technique to alleviate the Janus problem. The final stage enhances the texture of the edited objects using a sophisticated image-to-image synthesis technique, ensuring that the results are visually realistic and align closely with the given prompts. Our experimental results demonstrate that GaussEdit surpasses existing methods in editing accuracy, visual fidelity, and processing speed. By successfully embedding user-specified concepts into 3D scenes, GaussEdit is a powerful tool for detailed and user-driven 3D scene editing, offering significant improvements over traditional methods.