🤖 AI Summary
This work addresses two key challenges in 3D hierarchical semantic segmentation (3DHS): inter-level optimization conflicts and severe fine-grained class imbalance. To this end, we propose a late-decoupled dual-branch framework featuring a shared backbone encoder and multiple decoders, where top-down hierarchical guidance and consistency constraints enable effective branch collaboration. We further introduce a novel semantic-prototype-driven mutual supervision mechanism between branches, which jointly decouples optimization while enhancing discriminability for rare classes. Our design alleviates hierarchical fitting conflicts, mitigates dominant-class bias, and supports plug-and-play integration. Extensive experiments on mainstream benchmarks—including ScanNet and S3DIS—across diverse backbone architectures demonstrate state-of-the-art performance. Ablation studies confirm that our core modules consistently and significantly improve multi-granularity segmentation accuracy over existing methods.
📝 Abstract
3D hierarchical semantic segmentation (3DHS) is crucial for embodied intelligence applications that demand a multi-grained and multi-hierarchy understanding of 3D scenes. Despite the progress, previous 3DHS methods have overlooked following two challenges: I) multi-label learning with a parameter-sharing model can lead to multi-hierarchy conflicts in cross-hierarchy optimization, and II) the class imbalance issue is inevitable across multiple hierarchies of 3D scenes, which makes the model performance become dominated by major classes. To address these issues, we propose a novel framework with a primary 3DHS branch and an auxiliary discrimination branch. Specifically, to alleviate the multi-hierarchy conflicts, we propose a late-decoupled 3DHS framework which employs multiple decoders with the coarse-to-fine hierarchical guidance and consistency. The late-decoupled architecture can mitigate the underfitting and overfitting conflicts among multiple hierarchies and can also constrain the class imbalance problem in each individual hierarchy. Moreover, we introduce a 3DHS-oriented semantic prototype based bi-branch supervision mechanism, which additionally learns class-wise discriminative point cloud features and performs mutual supervision between the auxiliary and 3DHS branches, to enhance the class-imbalance segmentation. Extensive experiments on multiple datasets and backbones demonstrate that our approach achieves state-of-the-art 3DHS performance, and its core components can also be used as a plug-and-play enhancement to improve previous methods.