🤖 AI Summary
Existing text-driven 3D editing methods suffer from multi-view inconsistency, low efficiency, and limited regional control due to their reliance on implicit representations. This work proposes a novel approach for editing 3D objects using single-view text instructions, uniquely integrating a single-view editing strategy with a sparse 3D Gaussian Splatting (3DGS) representation. To ensure geometric coherence, the method leverages a multi-view diffusion model to identify and select views that exhibit consistent edits for subsequent reconstruction. By doing so, it simultaneously achieves strong cross-view consistency, significantly improved editing efficiency, and enhanced local controllability. Extensive experiments demonstrate that the proposed method consistently outperforms current baselines across diverse scenarios, achieving notable advantages in both editing quality and processing speed.
📝 Abstract
Text-driven 3D scene editing has attracted considerable interest due to its convenience and user-friendliness. However, methods that rely on implicit 3D representations, such as Neural Radiance Fields (NeRF), while effective in rendering complex scenes, are hindered by slow processing speeds and limited control over specific regions of the scene. Moreover, existing approaches, including Instruct-NeRF2NeRF and GaussianEditor, which utilize multi-view editing strategies, frequently produce inconsistent results across different views when executing text instructions. This inconsistency can adversely affect the overall performance of the model, complicating the task of balancing the consistency of editing results with editing efficiency. To address these challenges, we propose a novel method termed Single-View to 3D Object Editing via Gaussian Splatting (SVGS), which is a single-view text-driven editing technique based on 3D Gaussian Splatting (3DGS). Specifically, in response to text instructions, we introduce a single-view editing strategy grounded in multi-view diffusion models, which reconstructs 3D scenes by leveraging only those views that yield consistent editing results. Additionally, we employ sparse 3D Gaussian Splatting as the 3D representation, which significantly enhances editing efficiency. We conducted a comparative analysis of SVGS against existing baseline methods across various scene settings, and the results indicate that SVGS outperforms its counterparts in both editing capability and processing speed, representing a significant advancement in 3D editing technology. For further details, please visit our project page at: https://amateurc.github.io/svgs.github.io.