🤖 AI Summary
Existing Transformer-based point cloud instance segmentation methods primarily model external relationships between scenes and queries, neglecting both the internal structural characteristics of scene features and intrinsic correlations among queries. To address this, we propose a unified framework that jointly models internal and external relationships. First, we design an adaptive superpoint aggregation module coupled with contrastive learning to optimize scene feature representation. Second, we introduce a geometry-aware self-attention mechanism incorporating geometric position encoding to explicitly capture fine-grained inter-query dependencies. The entire method is seamlessly integrated into a Transformer architecture without requiring post-processing. Our approach achieves state-of-the-art performance on ScanNetV2, ScanNet++, ScanNet200, and S3DIS, significantly improving instance discrimination accuracy and cross-dataset generalization. Extensive experiments validate the effectiveness of jointly modeling internal feature structure and external query relationships.
📝 Abstract
3D instance segmentation aims to predict a set of object instances in a scene, representing them as binary foreground masks with corresponding semantic labels. Currently, transformer-based methods are gaining increasing attention due to their elegant pipelines and superior predictions. However, these methods primarily focus on modeling the external relationships between scene features and query features through mask attention. They lack effective modeling of the internal relationships among scene features as well as between query features. In light of these disadvantages, we propose extbf{Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation}. Specifically, we introduce an adaptive superpoint aggregation module and a contrastive learning-guided superpoint refinement module to better represent superpoint features (scene features) and leverage contrastive learning to guide the updates of these features. Furthermore, our relation-aware self-attention mechanism enhances the capabilities of modeling relationships between queries by incorporating positional and geometric relationships into the self-attention mechanism. Extensive experiments on the ScanNetV2, ScanNet++, ScanNet200 and S3DIS datasets demonstrate the superior performance of Relation3D.