🤖 AI Summary
Existing 3D scene segmentation methods face two key challenges: erroneous cross-view object association—particularly when objects reappear after occlusion—and semantic ambiguity and floating artifacts arising from the decoupled optimization of segmentation and geometric reconstruction. This paper proposes the first framework jointly optimizing 2D segmentation consistency and 3D Gaussian splatting. Its core contributions are: (1) a point-to-pixel association mechanism enabling fine-grained, frame-to-frame object correspondence; (2) piecewise planar displacement constraints preserving local geometric structure integrity; and (3) a unified objective jointly optimizing semantic segmentation and Gaussian parameters. Evaluated on ScanNet and Replica, our method achieves significant improvements in both 2D panoptic segmentation and 3D Gaussian segmentation accuracy. It effectively suppresses cross-view misassociations and floating objects, while enhancing semantic compactness and geometric fidelity of reconstructed scenes.
📝 Abstract
Achieving a consistent and compact 3D segmentation field is crucial for maintaining semantic coherence across views and accurately representing scene structures. Previous 3D scene segmentation methods rely on video segmentation models to address inconsistencies across views, but the absence of spatial information often leads to object misassociation when object temporarily disappear and reappear. Furthermore, in the process of 3D scene reconstruction, segmentation and optimization are often treated as separate tasks. As a result, optimization typically lacks awareness of semantic category information, which can result in floaters with ambiguous segmentation. To address these challenges, we introduce CCGS, a method designed to achieve both view consistent 2D segmentation and a compact 3D Gaussian segmentation field. CCGS incorporates pointmap association and a piecewise-plane constraint. First, we establish pixel correspondence between adjacent images by minimizing the Euclidean distance between their pointmaps. We then redefine object mask overlap accordingly. The Hungarian algorithm is employed to optimize mask association by minimizing the total matching cost, while allowing for partial matches. To further enhance compactness, the piecewise-plane constraint restricts point displacement within local planes during optimization, thereby preserving structural integrity. Experimental results on ScanNet and Replica datasets demonstrate that CCGS outperforms existing methods in both 2D panoptic segmentation and 3D Gaussian segmentation.