🤖 AI Summary
Existing 3D Gaussian splatting methods lack object-level consistency and semantic structure, limiting their applicability to high-level tasks such as panoptic segmentation. This work proposes a decoupling strategy that, for the first time, separates scene-level segmentation from object-level 3D Gaussian reconstruction. By leveraging depth-guided cross-view instance mask propagation, the method obtains semantically consistent object regions, reconstructs each object independently, and then fuses them back into the global scene. It further introduces boundary refinement and instance-aware semantic embeddings to enhance geometric and semantic fidelity. Evaluated on the ScanNetv2 panoptic segmentation benchmark, the approach achieves state-of-the-art performance while enabling high-quality, semantics-aware 3D reconstruction. The resulting representation supports various downstream applications, including zero-shot panoptic segmentation, object retrieval, and 3D editing.
📝 Abstract
3D Gaussian Splatting (GS) enables fast and high-quality scene reconstruction, but it lacks an object-consistent and semantically aware structure. We propose Split&Splat, a framework for panoptic scene reconstruction using 3DGS. Our approach explicitly models object instances. It first propagates instance masks across views using depth, thus producing view-consistent 2D masks. Each object is then reconstructed independently and merged back into the scene while refining its boundaries. Finally, instance-level semantic descriptors are embedded in the reconstructed objects, supporting various applications, including panoptic segmentation, object retrieval, and 3D editing. Unlike existing methods, Split&Splat tackles the problem by first segmenting the scene and then reconstructing each object individually. This design naturally supports downstream tasks and allows Split&Splat to achieve state-of-the-art performance on the ScanNetv2 segmentation benchmark.