🤖 AI Summary
Panoramic image segmentation suffers from limited large-scale annotated datasets, leading to reliance on planar pre-trained models that fail to model spherical distortion and boundary discontinuities. To address this, we propose a spherical convolutional sampling method compatible with standard 2D backbone networks. Our approach features two key innovations: (1) mapping planar pre-trained weights onto a spherical discrete sampling grid to enable distortion-aware feature extraction; and (2) introducing a spherical-feature-guided channel attention mechanism to enhance representation learning for critical regions. Crucially, our method requires no architectural modifications to the backbone and is plug-and-play. Extensive experiments on major indoor panoramic benchmarks—including Stanford2D3D—demonstrate significant improvements in segmentation accuracy, confirming both effectiveness and generalizability across diverse scenes and backbone architectures.
📝 Abstract
Due to the current lack of large-scale datasets at the million-scale level, tasks involving panoramic images predominantly rely on existing two-dimensional pre-trained image benchmark models as backbone networks. However, these networks are not equipped to recognize the distortions and discontinuities inherent in panoramic images, which adversely affects their performance in such tasks. In this paper, we introduce a novel spherical sampling method for panoramic images that enables the direct utilization of existing pre-trained models developed for two-dimensional images. Our method employs spherical discrete sampling based on the weights of the pre-trained models, effectively mitigating distortions while achieving favorable initial training values. Additionally, we apply the proposed sampling method to panoramic image segmentation, utilizing features obtained from the spherical model as masks for specific channel attentions, which yields commendable results on commonly used indoor datasets, Stanford2D3D.